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quantity. These are particularly useful in treating cellu- 
iosic materials including cotton-containing fabrics, as 
detergent additives, and in aqueous compositions. We 
also provide genomic DNA which can be used in recom- 
binant expression vectors and expression systems to 
produce enhanced alkali and/or temperature stability 
properties in cellulases other than those specifically de- 
scribed. 
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Description 

A FIELD OF THE INVENTION 

5 [0001] The present invention is directed to improved methods for treating cellulosic materials, including cotton-con- 
taining fabrics and non-cotton containing cellulose fabrics with novel truncated cellulase enzymes. In addition, this 
invention relates to novel truncated cellulase enzymes which exhibit cellulase activity, DNA constructs encoding the 
enzymes, cellulolytic agents comprising the enzymes, and detergent and water purifying or conditioning compositions 
containing the enzymes. In particular, this invention provides thermophilic cellulases isolated from a thermophilic anaer- 

10 obic bacterial strain found in New Zealand. The cellulase genes from this organism are identified and sequenced, and 
the cellulases expressed from this bacterium are shown to be particularly useful in the abrasion of denim, and in the 
manufacture of clothing having a "stone wash" look. Most importantly, the cellulases of this invention possess unex- 
pected proteolytic and chemical stability, as well as thermal and pH stability in hot alkaline solutions, thereby rendering 
them important to as laundry detergent additives in many industrial and home washing applications. 

15 

B BACKGROUND OF THE INVENTION 

[0002] During or shortly after their manufacture, cotton-containing fabrics can be treated with cellulase enzymes in 
order to impart desirable properties to the fabric. For example, in the textile industry, cellulase has been used to improve 
20 the feel and/or appearance of cotton-containing fabrics, to remove surface fibers from cotton -containing knits, for im- 
parting a "stone washed appearance to cotton -containing denims and the like. 

[0003] Clothing made from cellulose fabric, such as cotton denim, is stiff in texture due to the presence of sizing 
compositions used to ease manufacturing, handling and assembling of clothing items. It typically has afresh dark dyed 
appearance. One desirable characteristic of indigo-dyed cloth is the alteration of dyed threads with white threads, 

25 which give denim a white on blue appearance. 

[0004] After a period of extended wear and laundering, the clothing items, particularly denim, can develop in the 
clothing panels and on the seams, localized areas of variation in the form of a lightening, in the depth and density of 
color. In addition, a general fading of the clothes, some pucker in the seams and some wrinkling in the fabric panels 
can often appear. Additionally, after laundering, sizing is substantially removed from the fabric resulting in a softer feel. 

30 in recent years such a distressed or "stonewashed" look, particularly in denim clothing has become very desirable to 
a substantial proportion of the public. 

[0005] Previous methods for producing the distressed look included stonewashing of a clothing item or items in a 
large tub with pumice stones having a particle size of about 1 by 1 inches and with smaller pumice particles generated 
by the abrasive nature of the process. Typically the clothing item is tumbled with the pumice while wet for a sufficient 
35 period such that the pumice abrades the fabric to produce in the fabric panels, localized abraded areas of lighter color 
and similar lightened areas in the seams. Additionally the pumice softens the fabric and produces a fuzzy surface 
similar to that produced by the extended wear and laundering of the fabric. This method also enhances the desired 
white on blue contrast described above. 

[0006] The use of pumice stones has several disadvantages, including overload damage to the machine motors, 

40 mechanical damage to transport mechanisms and washing drums, environmental waste problems from the grit pro- 
duced and high labor costs associated with the manual removal of the stones from the pockets of the garments. 
[0007] In view of the problems associated with pumice stones in stonewashing, cellulase solutions are used as a 
replacement for the pumice stones under agitating and cascading conditions, i. e., in a rotary drum washing machine, 
to impart a "stonewashed" appearance to the denim. 

45 [0008] Cellulases are enzymes which hydrolyze cellulose (p-1 ,4-D-glucan linkages) and produce as primary products 
glucose, cellobiose, cello-oligosaccharides and the like. Cellulases are produced by a number of microorganisms and 
comprise several different enzyme classifications including those identified as exo-cellobiohydrolases (CBH), endog- 
lucanases (EG), and p-glucosidases (BG). Enzymes within these classifications can be separated into individual com- 
ponents. The complete cellulase system comprising CBH, EG, and BG components synergistically act to convert crys- 

so talline cellulose to glucose. 

[0009] A problem with the use of complete cellulase compositions from previously described microorganism sources 
for stonewashing dyed denim is the incomplete removal of colorant caused by redeposition or ■backstaining" of some 
of the dye back onto the cloth during the stonewashing process. In the case of denim fabric, this causes recoloration 
of the blue threads and blue coloration of the white threads, resulting in less contrast between the blue and white 

55 threads and abrasion points (i.e., a blue on blue look rather than the preferred white on blue). This redeposition is 
objectionable to some users. 

[001 0] Some cellulases are used commercially even though they result in backstaining because of their higher activity 
in denim material. Either high specific activity or a high level of purity results in a higher degree of abrasion in a signif- 
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icantly shorter processing time and therefore is preferable to commercial denim processors. 

[0011] Attempts to reduce the amount of redeposition of dye included the addition of extra chemicals or enzymes, 
such as surfactants, proteases, or other agents, into the cellulase wash to help disperse the loosened dye. In addition, 
processors have used less active whole ceifulase, along with extra washings. However this results in additional chem- 
5 ical costs and longer processing times. Finally the use of enzymes and stones together leave the processor with all 
the problems caused by the use of the stones alone. Accordingly, it would be desirable to find a method to prevent 
redeposition of colorant during stonewashing with cellulases. 

[0012] There have been previous attempts to prevent backstaining. Patent WO 92/06221 of Genencor pertains to 
backstaining and indicates that the cellulose biohydralase (CBH) found in fungal cellulases is largely responsible for 

10 strength loss of the fabric and that a 5 to 1 ratio of endoglucanase to CBH is desirable. WO 96/23928, also to Genencor, 
relates to use of a truncated cellulase core enzyme. Both of these references emphasize the use of buffers to stabilize 
the cellulase solution in the wash environment. In the art it is recognized that cellulase activity is pH dependent. Most 
cellulases will exhibit cellulolytic activity within an acidic to neutral pH range, and the pH of an unbuffered cellulase 
solution could be outside the range required for cellulolytic activity. This can be undesirable and requires the addition 

is of reagents to lower the pH of the denim following the wash cycle increasing the processing expense. 

[0013] Applications of cellulases for textile processing and in commercial detergents demand proteins which are 
stable under highly alkaline conditions in the presence of surfactants as well as elevated temperatures. 

C BRIEF DESCRIPTION OF THE INVENTION 

20 

[0014] Microorganisms from New Zealand hot springs are a recognized potential source of alkalophilic and ther- 
mophilic enzymes. We have examined numerous of these microorganisms isolated from thermal pools fortheir cellulase 
activity under alkaline conditions. The approach used was to grow the isolated bacterial cultures on cotton in order to 
enrich for strains that contain cellulase activity. Selected strains were grown on a larger scale and culture supernatants 

25 were then individually screened for the desired stone-wash effect. A particular strain of unknown species, but most 
closely resembles those in the Caldicel/ulosiruptor genus and which has been called by us, Tok7B.1 , was identified 
from this testing. Further investigation resulted in the discovery, in accord with this invention, of six different glycosidase 
containing genes, designated A through F, which were identified and sequenced. These genes, or gene fragments, 
were selected for cellulase activity, cloned and expressed. The expressed proteins, especially those designated E1, 

30 E1/2, B5, B4/5, and E3/B5 were purified and characterized. These enzymes were shown to have alkaline activity 
profiles with maximal activity near pH 8.0. These proteins were tested in the textile processing applications including 
stone washing, and anti-staining or anti-graying, as well as other applications using alkaline pH and/or elevated tem- 
peratures, and demonstrated excellent properties in these applications. These highly active cellulase proteins, the DN A 
encoding these cellulase genes, and recombinant production methods and means for such production of the highly 

35 active cellulases are all provided by the invention. 

[0015] This invention demonstrates that intact gene products are not required or necessarily desirable for use in 
many textile processing applications, and that the stability and functionality of these proteins can be varied dramatically 
by selective combination different genetic fragments, thereby enhancing the activity of the novel proteins herein 
claimed. The stability enhancing gene fragments can also be expressed with other cellulase genes to confer the im- 

40 proved thermal or high alkaline stability on previously described cellulase proteins. 

D SUMMARY OF THE INVENTION 



[0016] This invention describes thermophilic bacterial genes that encode multidomain genes containing combina- 
tions of cellulase, xylanase or cellobiohydralase activities. Truncated forms of these genes have demonstrated useful 
stonewash and detergent application activities with cotton cloth. Specific oligonucleotide sequences were identified 
that when used as PCR primers were shown to amplify genetic sequences that, encode a series of protein domains 
containing glycohyrolase, thermal stabilizing and cellulose binding activities. A specific protein domain designated 
CelE2 was shown to function as a thermal stabilizing domain. The addition of this domain to an endoglucanase in- 
creased the thermostability by 25C. This activity could be widely applicable for enhancing the thermal stability of other 
genes. 

[0017] The genes were obtained from the thermophilic obligate anaerobic bacterium by PCR amplification of the 
genomic DNA. The synthetic oligonucleotide primer sequences used for the gene amplification reactions were based 
on either N-terminal protein sequence data, from which degenerate probes were designed, orf rom genomic expression 
library constructs that had been screened for cellulase, cellolobiosidase or xylanase activities. These specific oligonu- 
cleotide probes can serve to amplify genes useful in stone washing and/or detergent applications from other unknown 
bacteria that have cellulase genes. 

[0018] Encoded gene fragments from the amplified genes identified as having cellulase activity were expressed in 
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E. coli either singly or in combination with cellulose binding domains and /or thermal stabilizing domains. The expressed 
proteins were and purified to homogeneity and characterized. Cotton containing cloth treated with certain of these 
truncated gene constructs having endoglucanase domains and/or cellulose binding domains gave a stonewash ap- 
pearance, and with other endoglucanase constructs a soil ant i rede posit ion effect. 

E BRIEF DESCRIPTION OF DRAWINGS 



[0019] Figures 1 A and 1B are a composite drawing of protein bands containing cellulase activity purified from the 
supernatant broth of the Tok7B.1 organism, and their N-terminal sequences. 
10 [0020] Figure 2 shows the results of the BLAST sequence homology search with the sequenced protein N-termini. 
[0021] Figure 3 is a diagram of two consensus primers TokcelA and TokcelB and their relationship to other family 9 
cellulases. 

[0022] Figure 4A and 4B show the genomic walking primers and the regions amplified to obtain the complete celE 
gene and flanking regions. Figure 4C depicts a restriction map and the genetic domain structure of the celE gene 

75 sequence, including flanking upstream and downstream sequences. 

[0023] Figure 5 A is a map of W2-4 and N-1 7 genomic DNA fragments isolated from the Tok7B.1 genome that express 
cellulase activity. Figure 5B depicts the genomic walking primers and the regions amplified to obtain the complete celA 
and ce/S genes. The genetic domain structure and restriction map of celA and celB is shown in Figure 5C. 
[0024] Figure 6 is a complete summary of the genetic domain structure of celA, celB and celE genes. 

20 [0025] Figures 7a and 7b are a map of the restriction sites and domain structure of the Tok7B.1 genes celC, celD, 
celE, celF, ceIG and ce/H genes. Also the genomic walking primers used to amplify and identify each of these genes 
and the genetic regions amplified are indicated. 

[0026] Figure 8 is a diagram of the genes and gene fragments transferred into pJLA602 controlled expression plasm id 
vectors. 

25 [0027] Figure 9 is a phylogenetic analysis of the Tok7B.1 organism. 

[0028] Figures 1 0-1 2 are flow diagrams for construction of the expression plasmids of pMcelE-1 and pMcelEI -2. 

[0029] Figure 1 3 is a flow diagram for construction of the expression plasmid pMcelEI -2-3. 

[0030] Figure 1 4 is a flow diagram for construction of the expression plasmid of pcelB4-5. 

[0031] Figure 1 4 A is a flow diagram for construction of the expression plasmid of pcelE3/B5. 
30 [0032] Figure 1 5 shows the sequence analysis and MALDI-TOF of the expressed cellulases. 

[0033] TABLE I lists the oligonucleotide primers designed and synthesized for study of the cellulase genes in the 

Tok7B.1 organism. 

[0034] TABLE II lists the oligonucleotides designed for PCR amplification and directional ligation of the Tok7B.1 
genes into controlled expression vectors. 
35 [0035] TABLE III shows the gene constructs expressed in E. coli by a T-7 promoter. 

[0036] TABLE IV is a summary T-7 expressed cellulases, their pH rate profiles, thermal stabilities and effectiveness 
in the stonewash application. 



F DETAILED DESCRIPTION OF THE INVENTION 

40 

I DEFINITIONS 



[0037] "Cotton-containing fabric" means sewn or unsewn fabrics made of pure cotton or cotton blends including 
cotton woven fabrics, cotton knits, cotton denims, cotton yams and the like. When cotton blends are employed, the 

45 amount of cotton in the fabric should be at least about 40 percent by weight cotton; preferably, more than about 60 
percent by weight cotton; and most preferably, more than about 75 percent by weight cotton. When employed as blends, 
the companion material employed in the fabric can include one or more non-cotton fibers including synthetic fibers 
such as polyamide fibers (for example, nylon 6 and nylon 66), acrylic fibers (for example, polyacrylonitrile fibers), and 
polyester fibers (for example, polyethylene terephthalate), polyvinyl alcohol fibers (for example, Vinylon), polyvinyl 

so chloride fibers, poiyvinylidene chloride fibers, polyurethane fibers, polyurea fibers and aramide fibers. 

[0038] "Cellulose containing fabric" means any cotton or non-cotton containing cellulosic fabric or cotton or non- 
cotton containing cellulose blend including natural cellulosics and manmade cellulosics (such as Jute, flax, ramie, 
rayon, and the like). Included under the heading of manmade cellulose containing fabrics are regenerated fabrics that 
are well known in the art such as rayon. Other manmade cellulose containing fabrics include chemically modified 

55 cellulose fibers (e g., cellulose derivatized by acetate) and solvent-spun cellulose fibers (e.g. lyocell). Of course, in- 
cluded within the definition of cellulose containing fabric is any garment or yarn made of such materials. Similarly, 
"cellulose containing fabric" includes textile fibers made of such materials. 

[0039] "Treating composition" means a composition comprising a truncated cellulase component which may be used 
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in treating a cellulose containing fabric. Such treating includes, but is not limited to, stonewashing, modifying the texture, 
feel and/or appearance of cellulose containing fabrics or other techniques used during manufacturing of cellulose 
containing fabrics. Additionally, treating within the context of this invention contemplates the removal of "dead cotton", 
from cellulosic fabric or fibers, i.e. immature cotton which is significantly more amorphous than mature cotton. Dead 
cotton is known to cause uneven dyeing. Additionally, "treating composition" means a composition comprising a trun- 
cated cellulase component which may be used in washing of a soiled manufactured cellulose containing fabric. For 
example, truncated cellulase may be used in a detergent composition of, washing laundry. Detergent compositions 
useful in accordance with the present invention include special formulations such as pre-wash, pre-soak and home- 
use color restoration compositions. Treating compositions may be in the form of a concentrate which requires dilution 
or in the form of a dilute solution or form which can be applied directly to the cellulose containing fabric. 
[0040] It is Applicants' present belief that the action pattern of cellulase upon cellulose containing fabrics does not 
differ significantly whether used as a stonewashing composition during manufacturing or during laundering of a soiled 
manufactured cellulose containing fabric. Thus, improved properties such as abrasion, redeposition of dye, strength 
loss and improved feel conferred by a certain cellulase or mixture of cellulases are obtained in both detergent and 
manufacturing processes incorporating cellulase. Of course, the formulations of specific compositions for the various 
textile applications of cellulase, e.g., stonewashing or laundry detergent or pre-soak, may differ due to the different 
applications to which the respective compositions are directed, as indicated herein. However, the improvements ef- 
fected by the addition of cellulase compositions will be generally consistent through each of the various textile appli- 
cations. 

II PREPARATION OF TRUNCATED CELLULASE ENZYMES 

[0041] The present invention relates to the use of truncated cellulases and derivatives of truncated cellulases. These 
enzymes are preferably prepared by recombinant methods. Additionally, truncated cellulase proteins for use in the 
present invention may be obtained by other art recognized means such as chemical cleavage or proteolysis of complete 
cellulase protein. 

[0042] The invention provides recombinant cellulase proteins which are alkalophilic and thermophilic and highly ac- 
tive and useful in washing applications, or in any applications including textile processing in which it is desirable to 
breakdown cellulose or cellulosic materials. It further provides DNA, free from its native genomic source, which encodes 
the recombinant cellulase active proteins in accord with the invention. In another preferred embodiment of this invention, 
we also provide genomic DNA which can be used in recombinant expression vectors and expression systems to pro- 
duce enhanced alkali and/or temperature stability properties in cellulases other than those specifically described. 
[0043] Also provided by the invention are bacteria cells capable of producing a native cellulase in accord with the 
invention and from which DNA encoding cellulases in accord with the invention may be obtained. Also provided is the 
native cellulase purified with respect to its native origins and associated native proteins such as by having a high protein 
purity or even absolute purity of at least 50%, e.g. 75%. 

[0044] By way of specific preferred embodiments, this invention provides the following five particularly highly active 
cellulase proteins: E1, E1/2, B4/5, B5, and E3/B5. 

[0045] E1 has an amino acid sequence of 446 amino acids extending from amino acid position No Y39 through amino 
acid position No D481 as given in Seq. ID No 44, or af unction equivalent analogue thereof. DNA encoding this cellulase 
may vary in accord with the genetic code and a specific embodiment of such a DNA sequence comprises the DNA 
extending from nucleotide position No 748 through nucleotide position No 2076 as given in Sequence ID No 2. 
[0046] E1/2 has an amino acid sequence of 600 amino acids extending from amino acid position No Y 39 through 
amino acid position No G635 as given in Seq. ID No 44, or a function equivalent analogue thereof. DNA encoding this 
cellulase may vary in accord with the genetic code and a specific embodiment of such a DNA sequence comprises the 
DNA extending from nucleotide position No 748 through nucleotide position No 2538 as given in Sequence ID No 2. 
[0047] B4/5 has an amino acid sequence of 645 amino acids extending from amino acid position No K635 through 
amino acid position No N 1426 as given in Seq. ID No 43, or a function equivalent analogue thereof. DNA encoding 
this cellulase may vary in accord with the genetic code and a specific embodiment of such a DNA sequence comprises 
the DNA extending from nucleotide position No 8601 through nucleotide position No 10532 as given in Sequence ID 
No 1. 

[0048] B/5 has an amino acid sequence of 418 amino acids extending from amino acid position No A 1001 through 
amino acid position No P 1424 as given in Seq. ID No 43, or a function equivalent analogue thereof. The B-5 protein 
can also end at K 1425 or N 1426, to include 419 or 420 amino acids, respectively. DNA encoding this cellulase may 
vary in accord with the genetic code and a specific embodiment of such a DNA sequence comprises the DNA extending 
from nucleotide position No 9255 through nucleotide position No 10526 as given in Sequence ID No 1. 
[0049] E3/B5 has an amino acid sequence of 61 6 amino acids, and is a hybrid protein formed from sequences taken 
from the E and the B portions of the native sequences. The eel B sequence is that described from amino acid position 
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No K635 through amino acid position No N 1426 as given in Seq. ID No 43. DNA encoding this hybrid cellulase may 
vary in accord with the genetic code, but a specific embodiment of such a DNA sequence comprises the DNA starting 
from the celE gene at G2659, ending at G3123, as given in Sequence ID No 2; then joined to a segment taken from 
the celB gene starting at G9153 and ending at A10.532., as given in Sequence ID No 1., or functional equivalent 
analogues thereof. This E3/B5 protein and its nucleotide sequence are described in Seq ID Nos 46 and 47 respectively. 
[0050] As will be recognized by those skilled in the art, DNA encoding active cellulases in accord with the invention 
may be modified in various ways to produce such cellulases for practical usage. For example, the DNA encoding a 
signal sequence may be removed and replaced by the codon ATG encoding for Met at amino acid position No. 31, 
using known techniques. The resulting DNA which lacks a signal sequence may be used to express active cellulase 
in accord with the invention, more particularly in E. coli, which cellulase product depending on the host strain will 
produce a cellulase with or without Met at its N-terminus, or mixtures of such products. Similarly, the signal sequence 
may be replaced by known techniques with other signal sequences to improve production, particularly secretion into 
the production media, and/or to adapt the DNA to particular hosts for production. 

[0051] The cellulase gene-containing inserts cloned and provided in accord with our invention contain all the control 
or regulatory sequences necessary for expression of the structural gene in bacterial hosts, particularly Bacillus and E. 
coli hosts. These sequences, such as promoter sequences, ribosome binding site sequences and the like may also 
be modified or replaced in whole or in part by other control sequences using known techniques to improve production 
and/or to adapt the DNA to particular hosts for production. When such a change is made, the resulting DNA sequence 
is deemed to involve the structural gene in sequence with heterologous DNA. 

[0052] The DNA encoding an active alkalophilic and thermophilic cellulase in accord with the invention may be in- 
corporated into a wide variety of vectors for various purposes such as replication of such DNA or expression of the 
structural gene or for purposes of causing incorporation of the DNA into the genome of a host cell for ultimate expression 
of the encoded gene. Such vectors will typically involve DNA sequences containing the DNA encoding the active 
cellulase recombined with other heterologous DNA. The terms heterologous DNA and the like as used herein generally 
refer to a DNA sequence which has a functional purpose and which is either different from the sequences in or obtained 
from a source other than the native Tok7B. 1 DNA from which the instant gene was cloned, thereby creating a continuous 
sequence which is not found or associated with the cellulase gene in the native Tok7B.1 source. Examples of such 
functional sequences are many and include for purposes of illustration origins of replications, genes for antibiotic re- 
sistance and also various control sequences, such promoter sequences to be used for effecting expression of the 
structural gene itself, as well as flanking sequences suitable for causing insertion of DNA containing the gene coding 
sequence into a host genome. Such vectors include for illustration only those commonly referred to plasmtds and those 
which are viral vectors. The construction of vectors is well-known and DNA sequences of widely different origins and/ 
or recombinations are available for such construction, such sequences also commonly called plasmids, viral vectors 
and the like. For example, a vector in accord with the invention and used by us can be obtained from the known piasmid 
pUC18 which contains the pBR 322-derived ampiciilin resistance gene and origin of replication, together with a portion 
of the E. coli lacZ gene (lacZ 1 ) encoding the a-complementation peptide. 

[0053] This lacZ' fragment has been engineered to contain a multiple cloning site (MCS). DNA inserted into the MCS 
inactivates the lacZ' gene, providing blue/white color selection of recombinants when appropriate hosts and indicator 
plates are used. The complete gene or clone we obtained can be inserted or ligated into the MCS and expressed in 
an E. coli host by operation of its own native control sequences. 

[0054] In general, the vectors of the invention are constructed with reference to suitability for incorporation into par- 
ticular host cells, and such transformed cells are also a part of the invention. As used herein, the term "transformed" 
and the like means the incorporation of vector DNA into a host cell independent of the purpose in terms of replication 
of the recombinant gene or its expression, or both, and whether the vector DNA remains intact in the cell or its contained 
cellulase encoding gene is incorporated for expression into the cell genome. The vectors of the invention may be 
transformed into any of a variety of cell types such as bacterial cell, yeast cells, insect and mammalian cells. Preferably, 
the transformed cells are bacteria or yeast cells, and more preferably are gram negative bacteria such as E. coli or 
gram positive bacteria such as Streptomyces or Bacillus cells where such Bacillus cells are not of thermophilic source, 
such preferred Bacillus types including Bacillus subtilis and the like. Methods for transforming cells with vectors are 
generally well-known. 

[0055] The invention also provides a process for producing the recombinant cellulase active proteins of the invention 
comprising culturing cells transformed with a recombinant expression vector of the invention comprising promoter DNA 
operatively controlling expression of the DNA encoding the cellulase protein. Methods of culturing such transformed 
cells to effect their multiplication and expression of the cellulase encoding gene of the transformed vector DNA are 
also well-known. Procedures for recovery of the recombinantly produced proteins are also known and may be used to 
obtain the cellulase of the invention in the more practical forms for use. In general, the recombinantly produced cellulase 
as expressed by the transformed cells may be retained within the cells and/or secreted into the culture media. When 
retained in quantity within the cells, the cells are lysed such as in a Warring Blender, sonifier or pressure cell to liberate 
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the cellulase into the culture media which is then usually treated to separate cellular debris and preferably filtered to 
obtain the cellulase in the resulting aqueous supernatant or filtrate. When secreted into the media, the culture liquid 
media or supernatant containing the cellulase is simply separated from the cells. Such filtrates and supernatants may 
then be used as a basis for a product for treatment of celiulosic materials, typically after concentration. Such cellulase- 

s containing liquids may also be treated, for example by microfilt ration, to separate undesired materials including lower 
molecular weight proteins. The resulting aqueous cellulase-containing compositions may also be treated to enhance 
their storage or use properties, for example, by addition of buffers to enhance stability of the cellulase. Hence, the 
cellulase products may be buffered between pH 5 to 10, preferably pH 7 to 9, using, for example, Tris buffer. 
[0056] The cellulases of the present invention have been found to be particularly useful for additives used in the 

10 cleaning or treatment of cellulose fabrics, including cotton-containing fabrics. They exhibit high activity even at high 
temperatures or high pH, thereby facilitating their suitability of aqueous detergent solutions and formulations. 
[0057] It will be recognized that the cellulases of the invention are obtained from a microorganism characteristic of 
those which are thermophilic and alkalophilic and which produce a variety of enzymes which may be similarly classified 
by favoring conditions encountered in natural thermally heated alkaline pools. A variety of microorganisms have been 

is identified in such pools. The cellulases of this invention originate from a particular strain of unknown species which 
most closely resembles those in the Caldicellulosiruptor genus and which has been called by us, Tok7B.1. 

HI DEPOSITS 

20 [0058] We have under the Budapest Treaty conditions, deposited with the American Type Culture Collection at Rock- 
ville, MD, USA, a biologically pure culture of the cells indicated below, which deposits were assigned the Accession 
Numbers given below along with their date of deposit 



Identification and Content of Deposit 


Accession No. 


Deposit Date 


E. coli BL21 (DE3) Cel E 


ATCC 98523 


August 29, 1997 


E. coli DH a F' 1Q Cel B 


ATCC 98524 


August 29, 1 997 


Tok7B.1 bacterial strain 


ATCC 202028 


September 10, 1997 



[0059] As will be recognized, any of the above deposits may be cultured under condition to cause expression of a 
cellulase of the invention in accord with the experiments described herein and such cellulase products recovered in a 
variety of product forms for use as also described herein. Alternatively, cultures of the deposited cells may be grown 
to multiple the number of copies of their contained plasmidal clones and the cellulase gene and coding sequence may 
be separated from the plasmids by the use of restriction enzymes, preferably by partial digest with Sau3AI, and the 
DNA encoding the cellulase (for example, an approx. 1 .57 Kb fragment upon Sau3AI partially digest) used for a variety 
of purposes including production of active cellulase protein of the invention in a wide variety of other expression sys- 
tems. 

IV METHODS OF TREATING CELLULOSE CONTAINING FABRIC USING TRUNCATED CELLULASE ENZYMES 

[0060] As noted above, the present invention pertains to methods for treating cellulose containing fabrics with a 
truncated cellulase enzyme. The use of the truncated cellulase composition of this invention provides the novel and 
surprising result of effecting a relatively low level of dye redeposition while maintaining an equivalent level of abrasion 
compared to prior art cellulase treatment. Because the level of abrasion acts as an indicator of the quality and effec- 
tiveness a of particular cellulase treatment techniques, e.g., stonewashingor laundering, the use of the instant invention 
provides a surprisingly high quality textile treatment composition. In the laundering context, abrasion is sometimes 
referred to as color clarification, defuzzing or biopolishing. 

[0061] The present invention specifically contemplates the use of truncated cellulase core, alone or in combination 
with additional cellulase components, to achieve excellent abrasion with reduced redeposition when compared to non- 
truncated cellulase. Additionally, naturally occurring cellulase enzymes which lack a binding domain are contemplated 
as within the scope of the invention. It is also contemplated that the methods of this invention will provide additional 
enhancements to treated cellulose containing fabric, including improvement in the feel and/or appearance of the fabric. 

A) METHODOLOGY FOR STONEWASHING WITH TRUNCATED CELLULASE COMPOSITIONS 

[0062] According to one aspect of the present invention, the truncated cellulase compositions described above may 
be employed as a stonewashing composition. Preferably, the stonewashing composition of the instant invention com- 
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prises an aqueous solution which contain a an effective amount of a truncated cellulase together with other optional 
ingredients including, for example, a buffer, a surfactant, and a scouring agent. 

[0063] An effective amount of truncated cellulase enzyme composition is a concentration of truncated cellulase en- 
zyme sufficient for its intended purpose. Thus an "effective amount 0 of truncated cellulase in the stonewashing com- 

5 position according to the present invention is that amount which will provide the desired treatment, e.g., stonewashing. 
The amount of truncated cellulase employed is also dependent on the equipment employed, the process parameters 
employed (the temperature of the truncated cellulase treatment solution, the exposure time to the cellulase solution, 
and the like), and the cellulase activity (e.g., a particular solution will require a lower concentration of cellulase where 
a more active cellulase composition is used as compared to a less active cellulase composition). The exact concen- 

io tration of truncated cellulase can be readily determined by the skilled artisan based on the above factors as well as 
the desired result. Preferably the truncated cellulase composition is present in a concentration of from 1-1000 PPM, 
more preferably 10-400 PPM and most preferably 20-100 PPM total protein. 

[0064] Optionally, a buffer is employed in the stonewashing composition such that the concentration of buffer is that 
which is sufficient to maintain the pH of the solution within the range wherein the employed truncated cellulase exhibits 

15 activity which, in turn, depends on the nature of the truncated cellulase employed. The exact concentration of buffer 
employed will depend on several factors which the skilled artisan can readily take into account. For example, in a 
preferred embodiment, the buffer as well as the buffer concentration are selected so as to maintain the pH of the final 
truncated cellulase solution within the pH range required for optimal cellulase activity. Preferably, buffer concentration 
in the stonewashing composition is about 0.001 N or greater. Suitable buffers include, for example, citrate and acetate. 

20 [0065] In addition to truncated cellulase and a buffer, the stonewashing composition may optionally contain a sur- 
factant. Preferably, the surfactant is present in a concentration in the diluted wash mediums of greater than 100 PPM, 
preferably from about 200-15,000 PPM. Suitable surfactants include any surfactant compatible with the cellulase and 
the fabric including, for example, anionic, non-ionic and ampholytic surfactants. Suitable anionic surfactants for use 
herein include linear or branched alkylbenzenesulfonates; alky I oralkenyl ether sulfates having linear or branched alky I 

25 groups or alkenyl groups; alkyl or alkenyl sulfates; olefinsulfonates; alkanesulfonates and the like. Suitable counter 
ions for anionic surfactants include alkali metal ions such as sodium and potassium; alkaline earth metal ions such as 
calcium and magnesium; ammonium ion; and alkanolamines having 1 to 3 alkanol groups of carbon number 2 or 3. 
Ampholytic surfactants include quaternary ammonium salt sulfonates, and betaine-type ampholytic surfactants. Such 
ampholytic surfactants have both the positive and negative charged groups in the same molecule. Nonionic surfactants 

30 generally comprise polyoxyalkylene ethers, as well as higher fatty acid alkanolamides or alkylene oxide adduct thereof, 
and fatty acid glycerine monoesters. Mixtures of surfactants can also be employed in manners known in the art. 
[0066] In a preferred embodiment, a concentrated stonewashing composition can be prepared for use in the methods 
described herein. Such concentrates would contain concentrated amounts of the truncated cellulase composition de- 
scribed above, buffer and surfactant, preferably in an aqueous solution. When so formulated, the stonewashing con- 

35 centrate can readily be diluted with water so as to quickly and accurately prepare stonewashing compositions according 
to the present invention and having the requisite concentration of these additives. 

[0067] Preferably, such concentrates will comprise from about 0.1 to about 50 weight percent of a cellulase compo- 
sition described above (protein); from about 0.1 to about 80 weight percent buffer; from about 0 to about 50 weight 
percent surfactant, with the balance being water. When aqueous concentrates are formulated, these concentrates can 

40 be diluted so as to arrive at the requisite concentration of the components in the truncated cellulase solution as indicated 
above. As is readily apparent, such stonewashing concentrates will permit facile formulation of the truncated cellulase 
solutions as well as permit feasible transportation of the concentration to the location where it will be used. The stone- 
washing concentrate can be in any art recognized form, for example, liquid, emulsion, gel, or paste. Such forms are 
well known to the skilled artisan. 

45 [0068] Other materials can also be used with or placed in the stonewashing composition of the present invention as 
desired, including stones, pumice, fillers, solvents, enzyme activators, and other anti-redeposition agents. 
[0069] The cellulose containing fabric is contacted with the stonewashing composition containing an effective amount 
of the truncated cellulase enzyme or derivative by intermingling the treating composition with the stonewashing com- 
position, and thus bringing the truncated cellulase enzyme into proximity with the fabric. For example, if the treating 

50 composition is an aqueous solution, the fabric may be directly soaked in the solution. Similarly, where the stonewashing 
composition is a concentrate, the concentrate is diluted into a water bath with the cellulose containing fabric. When 
the stonewashing composition is in a solid form, for example a pre-wash gel or solid stick, the stonewashing composition 
may be contacted by directly applying the composition to the fabric or to the wash liquor. 

[0070] The cellulose containing fabric is incubated with the stonewashing solution under conditions effective to allow 
55 the enzymatic action to confer a stonewashed appearance to the cellulose containing fabric. For example, during 
stonewashing, the pH, liquor ratio, temperature and reaction time may be adjusted to optimize the conditions under 
which the stonewashing composition acts. ""Effective conditions" necessarily refers to the pH, liquor ratio, and temper- 
ature which allow the truncated cellulase enzyme to react efficiently with cellulose containing fabric. The reaction con- 
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ditions for truncated cellulase core, and thus the conditions effective for the stonewashing compositions of the present 
invention, are substantially similar to well known methods used with corresponding non-truncated cellulases. Similarly, 
where a mixture of truncated and non-truncated cellulase is utilized, the conditions should be optimized similar to where 
a similar combination may have been used. Accordingly, it is within the skill of those in the art to max.rn.ze conditions 
for using the stonewashing compositions according to the present invention. 

r0071 1 The liquor ratios during stonewashing, i.e., the ratio of weight of stonewashing compos.t.on solution (..e.. the 
wash liquor) to the weight of fabric, employed herein is generally an amount sufficient to achieve the desired stone- 
washing effect in the denim fabric and is dependent upon the process used. Preferably, the liquor rat.os are from about 
41 to about 50 V more preferably from 5:1to about 20:1, and most preferably from about 10:1 toaboutl5:1. Reac ion 
temperatures during stonewashing with the present stonewashing compositions are governed by two competing fac- 
tors Firstly higher temperatures generally correspond to enhanced reaction kinetics, i.e.. faster react.ons. wh.ch permit 
reduced reaction times as compared to reaction times required at lower temperatures. Accordingly, reaction temper- 
atures are generally at least about 10'C and greater. Secondly, cellulase is a protein which loses actrv.ty beyond a 
given reaction temperature which temperature is dependent on the nature of the cellulase used Thus, if the reaction 
temperature is permitted to go too high, then the cellulolytic activity is lost as a result of the denaturing of the cellulase 
As a result, the maximum reaction temperatures employed herein are generally about 65-C In v.ew °^e above 
reaction temperatures are generally from about 30°C to about 65-C; preferably, from about 35"C to about 60 C, and 
more preferably, from about 35°C to about 55'C. 

r00721 Reaction times are dependent on the specific conditions under which the stonewashing occurs. For example, 
pH temperature and concentration of truncated cellulase will all effect the optimal reaction time. Generally, reaction 
times are from about 5 minutes to about 5 hours, and preferably from about 10 minutes to about 3 hours and, more 
preferably, from about 20 minutes to about 1 hour. 

r0073] Cellulose containing fabrics treated in the stonewashing methods described above using truncated ce u lase 
compositions according to the present invention show reduced redeposition of dye as compared to the same cellulose 
zs containing fabrics treated in the same manner with an non-truncated cellulase composition. 

R ) METHODOLOGY FOR TREATING CELLULOSE CONTAININ G FABRICS WITH A DETERGENT COMPOSITION 
COMPRISING TRUNCATED CELLULASE ENZYME 

30 [0074] According to the present invention, the truncated cellulase composition described above may be employed 
n detergent compositions. The detergent compositions according to the present invention are use ful as pre-wash 
compositions, pre-soak compositions, or for detergent cleaning during the regular wash cycle. Preferably, the detergent 
composition which can be dry mixed or in an aqueous liquid formulation, of the present invention compr.ses an effective 
amount of truncated cellulase. and a surfactant, and optionally include other ingredients and additives commonly em- 

35 ployed in detergent formulations. An effective amount of truncated cellulase employed in the detergent compositions 
of this invention is an amount sufficient to impart improved anti-graying, anti-staining. anti-backstainmg. or ant.-so,l 
deposition of cotton orcellulosic containing fabrics. Preferably, the truncated cellulase employed is in a concentration 
of about 0 001% to about 25%. more preferably, about 0.02% to about 1 0% by weight percent of detergent. 
r00751 The specific concentration of truncated cellulase enzyme employed in the detergent compos.t.on is preferably 

40 selected so that upon dilution into a wash medium, the concentration of truncated cellulase enzyme is m a range of 
about 0 1 to about 1000 PPM. preferably from about 0.2 PPM to about 500 PPM. and most preferably from abou 0. 
5 PPM to about 250 PPM total protein. Thus, the specific amount of truncated cellulase enzyme employed in the 
detergent composition will depend on the extent to which the detergent will be diluted upon addition to water so as to 

45 [0076] W AUowerconcentrations of truncated cellulase enzyme, i.e.. concentrations of truncated enzyme lower than 
20 PPM the decreased backstaining or redeposition with equivalent surface fiber abrasion when compared to prior 
art compositions will become evident after repeated washings. At higher concentrations, i.e.. COTC rf entra *^° f J"^. 
cated cellulase enzymes of greater than 40 PPM. the decreased backstaining with equivalent surface fiber removal 
will become evident after a single wash. 

so [00771 This invention is illustrated by the following procedures and examples. 

[0078] Applications of cellulases for textile processing and in commercial detergents demand proteins that are stable 
under conditions of alkaline pH and elevated temperatures. 



55 



V EXAMPLES 

Isolation ot cellulase secreting microorganisms from alkaline thermal pools 

[0079] To identify thermal stable glycolytic proteins, microorganisms were isolated from the water and sediment 
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samples taken from geothermal pools in the central volcanic region of New Zealand's North Island. The criteria for the 
pools sampled were temperatures of at least 50° C and pH values of greater than 6.0. A total of twenty samples were 
collected from geothermal pools that met the criteria. Each of the samples contained a complex mixture of microor- 
ganisms. In order to enrich the samples for microorganisms that expressed cellulase genes with desired cellulase 
activity 1 ml_ volumes of the collected sample were inoculated into 10 mL of 2/1 medium in Hungate tubes containing 
either amorphous cellulose (7g/L) or unbleached cotton fabric (approximately 1 cm square) as cellulose substrate, at 
pH 7.0 and pH 8.5. These tubes were incubated at 70° C and the cultures viewed microscopically after 4 days. The 
enrichment strategy was based on the assumption that the presence of the cellulosic fibers would induce expression 
of the cellulase genes in the microorganisms, and that those microorganisms would flourish under these conditions. 
From this collection of organisms the anaerobic, cellulase producer, Tok7B. 1 was isolated from a water/sediment sam- 
ple take from Tokaanu Pool 7, situated in the central volcanic region of the North Island, New Zealand. The pH and 
temperature of this particular pool at the time of sampling were pH 7.5 and 60° C, respectively. 
[0080] The 2/1 medium and amorphous cellulose, pH 7.0 proved the most favorable for the growth of the anaerobic 
rods from the Tok7B.1 sample, and after further subculturing, PAHBAH (p-hydroxybenzoic acid hydrazide) assays 
(Lever, 1973) on the concentrated supernatant confirmed the presence of cellulase-producing organisms. The sub- 
strate for these PAHBAH assays was 0.2% carboxymethyl cellulose (low viscosity) in 100 mM Taps buffer pH 8 8 at 
20° C. 

[0081] A pure culture of Tok7B. 1 was obtained using a version of the Roll Tube method described by Hungate (1 969). 
Serial dilutions of the positive cultures were make in Hungate tubes containing the growth medium + 18 g/l agar. The 
agar/culture mixture was solidified around the inside of the sealed tube by rolling in a flat dish containing iced water. 
Tubes were incubated at 70° C and single colonies removed aseptically using a Pasteur pipette with the tip bent a right 
angles. A plug of agar was placed in liquid medium and the cells released by crushing against the side of the tube. 
Positive identification of a cellulase producer was again confirmed by PAHBAH assays of the culture supematants. To 
detect secreted cellulases supematants from the cultures were concentrated approximately ten fold prior to being 
assayed for cellulase activity using CMCase assay. 

[0082] The Tok7B.1 cellulases were identified in a secondary screening assay that served to evaluate the biostone 
washing effectiveness of the cellulases secreted into the sample supernatant. Each of the cultures selected for screen- 
ing was fermented in sufficient quantity and the supematants concentrated in order to provide sufficient activity for the 
biostone wash testing, approximately 10,000 CMCase units. The supematants were tested in a 2L drum denim assay 
at equivalent levels of CMCase activity. The cellulases were tested under the following conditions; pH 7.0 for 60 minutes 
at 50 ° C using 135g of blue denim samples were washed for 1h at pH 7.0. The light reflectance value on the blue 
denim cloth and from a swatch of white cloth included in the wash were determined by measuring the level of denim 
abrasion and backstaining, respectively. Blue denim samples that demonstrated a reflectance value of above 15 and 
a dose dependent effect with increasing concentrations of fermentation supematants were considered to contain can- 
didate cellulases. White cloth swatches that have a reflectance of below 4 were acceptable for backstaining. Based 
on these tests the Tok7B.1 organism was found to produce the most effective cellulases, giving the highest abrasion 
with the lowest backstaining of the samples tested. 

Strategy for Identifying industrially useful cellulases 

[0083] Our strategy was to identify industrially useful cellulases secreted from the Tok7B.1 organism, then to identify 
the individual genes responsible for that activity. The following steps were carried out to clone the individual genes, 
express these genes in an intermediary expression system and test the individual cellulases in the application. The 
first step in the strategy was to identify the individual proteins secreted by the Tok7B.1 bacterium. Identification of the 
individual cellulases secreted by the bacterium was important because identification of the genes effective in the ap- 
plication would limit the number of cellulase genes and gene constructs that would have to be expressed and tested. 

Cellulase Nomenclature 

[0084] Genes and genetic constructs are designated in small letters and are italicized, for example the genes that 
encode the CelE proteins are designated celE. Conversely proteins are designated by capitalizing the first letter and 
are not italicized, for example, CelE1. The Tok7B.1 cellulase genetic domains are designated in Figure 6, and one 
should be careful not toconfuse these with the protein designations shown in the third column of Table III. For example 
the CelE1 protein is comprised of the second genetic domain in the ce/E gene. 

Identification of N-termlnal Sequences of Tok7B.1 cellulases 

[0085] The culture supernatant from the Tok7B.1 strain was chromatographed on a Mono-S column (Pharmacia) at 



10 



EP0 921 188 A2 



pH 5 0 in 1 0 mM sodium acetate buffer at a flow rate of 1 ml/min. The bound proteins were eluted with a 30 ml linear 
gradient of NaCI from 0-250 mM. Each of the fractions collected was assayed for CMCase activity. 73 /o of the total 
CMCase activity was collected into fractions and 27% of the activity was found in the column flow-through. The proteins 
from fractions that demonstrated CMCase activity were electrophoresed on an 8% SDS polyacrylam.de gel. Protein 
bands in fractions containing cellulase activity could be observed in a Coomassie-stained 8% SDS polyacrylam.de gel. 
The cellulase activity of these bands was confirmed in part by overlaying the SDS polyacrylamide gel with an agarose 
gel containing carboxymethyl cellulose (CMC). Cellulases not denatured by the SDS degrade the CMC in the agarose 
qel These areas of degraded CMC can be identified by staining with Congo Red using the methods of Begum (1 983) 
and Mackenzie and Williams (1984). Proteins of interest were blotted from the SDS-PAGE gel onto an Immobilon 
membrane and then the amino terminal sequences determined by Edman degradation (Matsudaria, 1987). The se- 
quences determined for each of the individual bands are shown in a composite drawing (Figure 1). CMCase activity 
that was not captured on the Mono-S column was subsequently buffer-exchanged into 12 mM Tr.s buffer pH 9.0, 
chromatographed on a Q sepharose column (1 .5 x 6 cm), and eluted with a 30 mL linear gradient of 0-250 mM WaCI 
Fractions that contained CMCase activity were electrophoresed on an 8% SDS PAGE and gave a protein band with 
identical apparent molecular weight and N-terminal sequence to the B5 band (Figure 1 ) previously identified from the 

S-sepharose column. . . 

[0086] The N-termini of each of these proteins was determined by Edman degradat.on. Only two different amino acid 
sequences were determined from the six proteins N-terminally sequenced. The N-terminus of the ce/E gene product 
was homologous with four of the proteins identified and the N-terminus of the celB gene product wash homologous 
with the two remaining protein bands. The amino acid sequence information served first to identify the genes that were 
expressing the cellulases useful for the applications. Second, the N-terminal sequences were compared with the protein 
sequences in GenBank using the Basic Linear Alignment Search Technique (BLAST. Jauris. et al.. 1990). Th.s con- 
firmed that the two proteins sequenced belonged to the glycosyl-hydrolase family. The celB gene product has an am.no- 
terminal sequence which shares significant homology with a general class of xylan degrading enzymes referred to as 
Family F beta-glycanases (Gilkes et al.. 1991) or Family 1 0 glycosyl-hydrolase (Henrisatt, 1991). The CelE gene prod- 
uct shares homology with family E beta-glycanases/Family 9 glycosyl-hydrolases. 

Strategy for the cloning of the cellulase genes 

[0087] Our strategy for identifying the Tok7B.1 glycolytic genes was to employ two approaches simultaneously. 1) 
Polymerase chain reaction (PCR) with primers based on the sequence information obtained from the BLAST search 
was used PCR to amplify gene sequences from the Tok7B.1 genomic DNA preparations. 2) An expression library of 
the Tok7B1 genomic DNA was constructed and screened for the expression of proteins able to degrade CMC. 

Methods and Prior Art 

[0088] Agarose gel electrophoresis, plasmid isolation. M1 3 m P 1 0 single stranded DNA isolation, use of DNA modi- 
fying enzymes and E. coli transformation were performed as described by Sambrook et al. (1989). 

Genomic DNA Preparation 

[0089] Tok7B.1 genomic DNA was prepared from a cell culture which had been grown under anaerobic conditions 
for 1-2 days without shaking at 70°C in 2/1 media. Cells were harvested from the growth media by centr.fugat.on at 
5000 rpm for 10 minutes, then resuspended in 50 ml TES buffer before a second centrifugation step. Cell pellets were 
then resuspended in 5ml 50mM Tris pH 8.0. mixed with 374uJ 0.5M EDTA and incubated for 20 minutes at 37 C. After 
the addition of 550ul freshly prepared lysozyme (10mg/ml). the mixture was incubated at 70'C for 20 minutes mixed 
with 250ul StmptomycGs griseus protease (40mg/ml) and 310ul 10% SDS. then left to incubate overnight at 70 C. 
After allowing the lysed cells to cool to room temperature, the resulting clear solution was phenol extracted 2-5 times 
until no material could be seen to partition at the interface. The remaining volume of the sample was estimated and a 
1/10 volume of 3M Sodium acetate was added and mixed, then 2.5 volumes of 95-100% ethanol gently layered onto 
the top of the sample. DNA could being seen as a stringy white precipitate at the interface of the two liquids and could 
be removed by spooling onto the end of a Pasteur pipette. Spooled DNA was transferred into a 1 .5ml m.crocentnf uge 
tube and washed in 70% ethanol before air drying for 1-3 hours. The resulting DNA pellet was resuspended .n TE 
buffer and left overnight to fully dissolve. All genomic DNA preparations were stored at 4°C. T , ,„ , f< - 

[0090] Isolation of thefok7B.1 celE gene using cons ensus PCR and Genomic walking PCR The Tok7B. I ce/E 
gene gene product CelE, was identified by amino-terminal sequencing of cellulolytic peptides secreted by Tok7B.1 
(Figure 2) The ce/E gene codes for a family 9 glycosyl hydrolase based on comparison to translated gene sequences 
in the GenBank database. The CelE peptide sequence shared highest similarity to family 9 glycosyl hydrolases from 
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other thermophilic Clostridial microorganisms. Homology alignments of family 9 genes indicated that it would be pos- 
sible to design consensus oligonucleotide primers which would bind to DNA coding for clusters of highly conserved 
amino acids found in all thermophilic Clostridial family 9 glycosyl hydrolases. These consensus primers could then be 
used in PCR to amplify family 9 glycosyl hydrolase genes from Tok7B. 1 . Two primers were designed, the first, tokcela, 
bound to DNA coding for the peptide sequence QKAIMFYEF, and tokcelr, which bound in the reverse orientation (with 
respect to the gene sequence) to DNA coding for the peptide sequence DYNAGFVGAL (Figure 3). 
[0091] The tokcela and tokcelr primers were used to amplify an approximately 1300bp PCR product from Tok7B.1. 
This product was ligated into M1 3 mpIO (Messing, 1 983), transformed in E constrain JM101 and plated to give individual 
recombinant plaques. In order to test whether the PCR product was generated from a single gene, or from multiple 
genes, PCR product was reamplified from individual plaques using the M1 3 forward and reverse primers then mapped 
by restriction digestion with 7sp509l. A total of 12 individual PCR products were restriction mapped and all showed 
identical restriction patterns. Six of these PCR products were sequenced and all showed identical DNA sequence. This 
data indicated that all cloned PCR products were amplified from a single family 9 glycosyl hydrolase gene present on 
the genome of Tok7B. 1. In order to obtain the complete celE gene sequence, new PCR primers were designed to allow 
genomic walking upstream and downstream of the region covered by the 1 300bp PCR product (Figure 4A). Standard 
subcloning and DNA sequencing techniques were used to obtain 641 6bp of DNA sequence containing the entire celE 
gene sequence plus flanking upstream and downstream sequence (Figure 4B). The complete DNA sequence and 
translated peptide sequence of the ce/Egene is given in Sequence #2. 

20 Genomic Library construction and screening 

[0092] Genomic DNA from Tok7B.1 was partially digested with the restriction endonuclease 7sp509l to give DNA 
fragments in the size range of 6-8kb. These fragments were then ligated into Xho\ -digested XZapl I (Stratagene, 11011 
North Torrey Pines Road, La Jolla, CA 92037, USA) then packaged and plated according to protocols supplied by 

25 Stratagene. Individual plaque isolates shown to contain genomic inserts using the blue/white lacZ complementation 
system present in AZapll were replated, and a total of 1600 genomic insert containing plaques were screened for 
thermophilic cellulase and xylanase activity at 70 9 C using the substrate overlay method of Teather and Wood (1982). 
Cellulase activity was detected using the soluble cellulose derivative carboxymethyl cellulose (CMC). Plaques were 
also screened for cellolobiohydrolase activity using the chromogenic substrate methylurnbelliferyl cellobioside (MUC) 

30 as described by Saul et al. (1 990). 

[0093] Two positive XZapll plaques, designated W2-4 and N17 ( were isolated which expressed thermophilic xylanase 
and/or cellulase activity (Figure 5A). These recombinant phage were converted to Bluescript SK- plasmids using the 
standard Exassist excision procedure described by Stratagene. Each plasmid was restriction mapped using a range 
of restriction endonucleases. Common restriction endonuclease digestion patterns indicated that W2-4 and N17 con- 

35 tained common overlapping DNA from the same region ol the Tok7B.1 genome (Figure 5A). 

DNA sequencing and sequence analysis of the Tok7B.1 celB and celA genes 

[0094] The recombinant DNA from W2-4 and N17 was partially sequenced by creating simple plasmid deletions 
40 using known restriction sites within the plasmid insert (Gibbs, et al. 1 991 ). Initial DNA sequence homology comparison 
data indicated a gene coding for a multidomain enzyme with a xylanase and a cellulase domain and several internal 
cellulase binding domains (CBD). The Genomic DNA contained by W2-4 was sequenced in full, and portions of N17, 
by subcloning and sequencing internal restriction fragments and using synthesized DNA oligonucleotide primers (prim- 
ers are listed in Table I). Analysis of the complete sequence of W2-4 showed that the DNA contained a complete gene, 
celB, coding for a nine-domain protein designated CelB. The S'-portion of a further gene was observed to lie upstream 
of the ce/5gene. This gene, designated celA, shared at least 1 domain in common with the ce/Bgene. The complete 
coding sequence of c&IA was obtained using Genomic Walking PCR (GW-PCR) as described by Morris et al. (1994). 
Representative GW-PCR products spanning the region of the celA gene are depicted in Figure 5B. The complete DNA 
sequence containing the celA and ce/Bgenes is depicted in Figure 5C, with each gene shown according to its translated 
domain structure. The complete DNA sequence and translated peptide sequence of the celA and celB genes is given 
in Sequence # 1. The translated product of the celB gene matches perfectly with two amino-terminal sequences ob- 
tained for native cellulolytic peptides secreted by Tok7B. 1 (Figure 2, peptides B2 and B4), implying that the celB gene 
expresses one of the major cellulases secreted by Tok7B.1 . 

[0095] A complete summary of the protein domain structures of CelA and CelB is given in Figure 6. 
[0096] The complete celE gene was observed to code for a large multidomain-multicatalytic enzyme with a putative 
length of 1751 amino-acids (unprocessed) and is composed of at least 10 discrete functional domains based on ho- 
mology comparisons (figure 6). The family 9 glycosyl hydrolase domain is the amino-terminal domain of the full length 
CelE, while the central domains of CelE (domains 4-9, figure 6) are virtually identical to the central domains of CelB 



45 



so 



55 



12 



EP0 921 188 A2 



(domains 3-8, figure 6), the only exception being the relative lengths of each PT-linker. The carboxy-terminal domain 
of CelE (domain 10, figure 6) is homologous to the carboxy-terminal endoglucanase domain (family 44 glycosyl hy- 
drolase) of ManA from C. saccharolyticus. This domain can degrade xylan as well as carboxymethylcellulose (Gibbs 
et al. 1 991 ) and activity assays have shown that the carboxy-terminal domain of Tok7B. 1 CelE is also an endoglucanase 
with weakxylanase activity. 

Identification of further Tok7B.1 cellulase genes using GW-PCR, ce/Cand celH 

[0097] In the process of obtaining the complete coding sequence of the Tok7B. 1 ce/E gene further ORFs were iden- 
tified upstream of this gene. Homology comparisons indicated that these genes also coded for cellulolytic enzymes. 
GW-PCR was used to obtain DNA sequence from upstream of the celE (figure 7A.) Two further genes were identified 
in this way. Both of these genes, designated ce/Cand ce!H, code for multidomain, multicatalytic proteins, with the same 
general structure as CelA, CelB and CelE. As the DNA sequence obtained was not contiguous, long-template PCR 
(Expand Long template PCR System, Boehringer Mannheim, Australia Pty. Ltd.) was used to amplify DNA between 
the sequenced regions to confirm that they were contiguous (figure 7A). Approximately 13500bp of genomic DNA 
upstream of the celE gene was partially sequenced. 

Identification of ce/F and ceIG 

[0098] During the isolation of the complete celA gene sequence the primer N17a was used as a genomic walking 
primer. A number of PCR products were obtained which did not match DNA sequence already obtained for the ce/B 
and celA genes. It was clear from these results that the N17a primer was CelB. Upstream of this second xylanase 
domain a further gene was identified coding for an enzyme with a carboxy-terminal family 48 glycosyl hydrolase domain. 
These genes were designated ce/F and ceIG respectively (figure 7B). Oligonucleotide primers specific to the carboxy- 
terminal end of the ceIG gene and the amino-terminaf end of the ce/Fgene were synthesized and used in combination 
with oligonucleotide PCRs which bound to DNA coding for the CBDs found in celA, celB, celE, ceIC and celH. The 
amplification of PCR products indicated that ceIG and ce/Fcoded for the proteins with the same basic domain structure 
of the other Tok7B.1 cellulolytic genes. The amino-terminal domain of ceIG was not identified, the carboxy-terminal of 
ce/F was identified as a family 48 glycosyl hydrolase with high homology to the carboxy-terminal domains of co!G and 
celC. 

Transfer of Tok7B.1 genes into control led-expression plasmid vectors 

[0099] To facilitate the transfer of Tok7B.1 cellulase genes into controlled-expression plasmid vector the general 
method of Gibbs et al. (1991) was used. PCR was used to amplify full length cellulase genes (and portions ot cellulase 
genes). Oligonucleotide primers corresponding to each end of the gene were engineered to contain restriction sites 
allowing directional ligation of restriction digested PCR product into plasmid multiple cloning sites. Table II. lists the 
oligonucleotides designed for PCR amplification and directional ligation of the various Tok7B.1 genes into controlled 
expression vectors. Each primer contains one or more restriction endonuclease site(s) to facilitate ligation of PCR 
product into plasmid vector predigested with the same restriction enzyme, resulting in an in-frame gene fusion between 
each thermophilic gene and a signal peptide sequence encoded on the vector. The various genes and gene fragments 
transfer into pJLA602 by this method are shown in Figure 8. 

Phyloqenetlc analysis of Tok7B.1 

[0100] The 16S SSU rRNA gene was isolated using PCR. A PCR product was generated using oligonucleotide 
primers designed to amplify the 1SS SSU rRNA gene from all known prokaryotic species. An approximately 1800bp 
PCR fragment was obtained which was cloned into M1 3 mpIO in the forward and reverse orientation, and sequenced 
(Seq #3). The SSU rRNA gene sequence obtained was compared to all genes in the GenBank database. Close ho- 
mologs of the Tok7B.1 SSU rRNA gene were aligned using the GCG multiple alignment software 'Pileup'. Resulting 
aligned sequence files were subsequently analyzed using parsimony methods (Swofford, 1993). Figure 9 shows the 
phylogenetic position of Tok7B.1 amongst cluster D of thermophilic Clostridia (Rainey et al., 1993). 

Cloning of individual genes into an E. coll expression vector 

[0101] From the ce/Eandce/Bgenes a number of new truncated genes containing either individual cellulase catalytic 
domains Cel El or catalytic domains connected to cellulose binding domains by linker sequences, Cel E1/2, CelE1/2/3 
and CelB4/5 have been constructed (Table III). Each of the genes have been individually expressed in E. coli using 
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the bacteriophage T-7 RNA poly me rase/promoter system (Studier and Moffatt, 1986). 
Expression Cloning of the CelE Domains D2 

5 [01 02] The N-terminal CelE endoglucanase catalytic domain (Figure 6) and the first cellulose-binding domain (CBD) 
(Figure 6) were used to construct expression plasmids pcelEI and pcelEI/2 respectively. These celE gene domains 
were obtained from the M13-mp10 clones, M13celE1 and M13celE1/2. The first step in the cloning process was the 
PCR amplification of domain 2 or domains 2 plus 3 of the celE gene from Tok7B.l genomic DNA (Figure 10). Unique 
restriction endoglucanase sites were introduced by the PCR primers at the 5* and 3' ends of the gene fragments. An 

10 Sphl site was incorporated at the 5' end of the native gene at the predicted translational start site, which encodes the 
translational ATG start codon, and Bglll sites were incorporated at the 3' ends of the specific gene domains at convenient 
locations. Translational stop codons were introduced just upstream of the Bglll sites. The PCR fragments were blunt 
end ligated directly into S/na/digested M1 3mp1 0 vector, (Messing, 1 983) to give the clones M1 3-celE 1 and M1 3-celE1/2 
(Figure 11). 

15 [01 03] Using the pET9a vector (Novagen) E. coli expression plasmids were constructed. The plasmid utilizes the T7 
Polymerase promoter for gene expression, (Studier, et al., 1 990). An intermediate construct was employed to facilitate 
the cloning process. The celE 1/2/3 gene was amplified using PCR. the forward direction primer tokcbdf, and the reverse 
direction primer lokcel (Figure 11). The forward primer, tokcbdf introduces a Ndel site at the 5' end of the mature ceE 
gene and thereby encodes the translational ATG start codon. The introduction of the Ndel restriction site changed the 

20 first two amino acids encoded in the mature sequence from GT to AA Table III. The reverse PCR primer, tokcel, was 
homologous to the native gene sequence at the Ndel site in CBD domain 3. The PCR fragment was digested with Ndel 
and gel-purified with silica gel technology using a Qiaex II gel extraction kit from Qiagen Inc. The fragment was ligated 
into the Ndel site of the pET9a vector (Figure 12). The resulting plasmid, pMcelE-Ndel, was digested with Pstl and 
BamHl and the vector fragment was isolated from the digest by agarose gel electrophoresis and silica gel purification. 

25 The M1 3-celE1 and M1 3-celE1/2 clones were digested with Pstl and Bglll and the resulting celE gene fragments, ce/E1 
and ca/EI/2, were isolated from the digest by agarose gel electrophoresis and silica gel purified (Figure 12). The 
fragments were ligated to the Pstl-BamHl digested pcelE-A/cte/ plasmid to form the final clones, pMcelEI and pMcelE1/2 
(Figure 1 2). Both the Bglll and BamHl restriction enzymes produce compatible sticky ends but these sites are lost upon 
ligation. 

30 

Expression Cloning of the CelE D2/3/4/5 

[0104] The peel E 1/2/3 plasmid encodes the first catalytic domain of the celE gene plus the first two cellulose-binding 
domains D3 and D5 (Table III) in a pET9a expression vector. The catalytic domain D2 and CBD D3 used in the con- 
35 struction of the pcelE 1/2/3 expression plasmid was obtained from the pcelEI/2 plasmid. The second cellulose-binding 
domain D5 was obtained from the pRR9 plasmid (Figure 8). The construction of the final plasmid required a three-way 
ligation that is outlined in Figure 13. 

[0105] The entire native celE gene was amplified by PCR from genomic Tok7B.1 DNA using the tocelef forward 
primer and the tokceler3 reverse primer Table II. The PCR primers contained an Sphl site in the forward primer, which 

40 introduces the ATG translational start codon, and a Sail site in the reverse primer. The PCR fragment was digested 
with Sphl and Sail and cloned into the Sphl and Sail sites of the poly I inker of the E. coli expression vector pJLA602, 
to produce the pRR9 plasmid (Figure 8). To obtain the gene fragment encoding domains 4 and 5 for ligation with the 
pcelE1/2 plasmid, the region from the Ncol site in D3 through D5 was PCR amplified from the pRR9 plasmid (Figure 
1 3). Tokcelef, the forward primer, was homologous to the celE sequence at the Ncol site and the tokcelebamr reverse 

45 primer was homologous to the end of D5, the second CBD in ceE and introduced a BamHl cloning site. This PCR 
fragment was digested with Ncol and BamHl and purified. The celE fragment from D2 to the 5" end of D3 at the Ncol 
site was isolated from the plasmid pcelEI/2 (Figure 13). The plasmid was digested with Ndel and Ncol and the calE 
fragment was isolated from the vector fragment by gel electrophoresis and silica gel technology. The vector, pET9a, 
was digested with Ndel and BamHl and purified by gel electrophoresis and silica gel technology. The two ceE fragments 

50 were ligated to the pET9a expression vector in a three-part ligation to produce the pcelEI/2/3 plasmid (Figure 13). 

Expression cloning of CelB4/5 

[01 06] A plasmid that expressed the CelB4/5 protein of the Tok7B. 1 ce/B gene was constructed in the E, coli expres- 
55 sion vector, pET9a, as described below. Domains 7, 8 and 9 containing a CBD and catalytic domain were PCR amplified 
from the Tok7B. 1 genomic DNA using primers tokcbdf and tokcelbr. These primers incorporated into the PCR fragment 
a unique 5' Ndel site by the forward primer and a unique 3' BamHl site (Figure 8). The fragment was digested with 
Ndel and BamHl and ligated into the Ndel and BamHl digested pJLA602 expression vector to produce the pRR6 
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plasmid (Figure 14). The pRR6 plasmid was digested with Ndel and BamHl and the celB gene was purified from the 
vector fragment by gel electrophoresis and silica gel technology. The pET9a vector was digested with Ndel and BamHl 
and purified by gel electrophoresis and silica gel technology. The two fragments were ligated together to produce the 
pcelB4/5 plasmid (Figure 14). 

5 

Expression Cloning of CelB3/4/5 

[0107] The CBDs of the ce!E gene, domains 4 & 5, (Figure 8) are very homologous to the CBDs of the celB gene, 
domains 3 & 4, (Figure 8). Also, the two CBDs within the genes are very homologous to each other. This homology is 

10 useful for the construction of the pcelB3/4/5 construct in the E. Coli expression vector pET9a. A homologous region 
of domain 3 of the celB gene is cloned from the celE gene construct. This is done by taking advantage of a Bglll site 
in each of the homologous celE CBD domains 4 & 5. This Bglll fragment is isolated by restriction digest from the celE 
construct pRR1 0 which encodes domains 3,4,5, & 6 of the celE gene, Figure 8, in the pJLA602 expression vector This 
Bglll fragment contains the 3' portion of celE Domain 3 and the 5' portion of celE Domain 4. This Bglll fragment is 

is ligated into the Bglll site of Domain 4, the CBD, of pcelB4/5. The resulting plasmid is pcelB3/4/5. 

Expression cloning of CelE3/B5 

[01 08] This clone is constructed in the E. Coli expression vector pET9a. Domain 3 of the celE gene is PCR amplified 
20 from pcelEl/2/3. The forward and reverse primers incorporated into the PCR fragment provide unique 5' Ndel and 3' 
BstEII sites. The PCR fragment is digested with Ndel and BstEII and ligated to the pcelB4/5 vector which is digested 
with Ndel and BstEII and gel purified (Figure 14A). The Ndel and BstEII digest of the pcelB4/5 results in the removal 
of the native celB CBD as well as 29 amino acids from the PT linker. 

25 Fermentation of the E. coli expressing cloned cellulase genes 

[0109] The peel E1, peel E1/2, peel E 1/2/3, pcelB4/5, pcelB3/4/5 and peel E3/B5 expression plasmids were trans- 
formed into E. coli DE3-BL21 (Stratagene Corp.). Transformants were grown at 37° C to an OD600 of 1.0 in 250 mL 
of L-broth containing 50 u.g/ml Kanamycin. The 250 ml of L-broth was then used to inoculate a 20 L Chemap f ermentor 

30 containing 12 liters of media. The fermentation media consisted of 12 g/L of tryptone, 24 g/L yeast extract, KH 2 P0 4 
2.3 g/L, 12.5 g/L K 2 HP0 4 , 1 mL/L Antifoam 289 (Sigma), 4g/L glycerol, 1 mL7L 1.0 M MgS0 4 .7H 2 0 and 50 ng/mL 
Kanamycin. The transformants were grown at 37° C to an OD600 of approximately 1 2 and then expression was induced 
by the addition of IPTG at a concentration of 95 mg/L. After a 3h induction the cells were harvested by centrifugation 
in 500ml bottles at 7,000 x g for 1 0 min. A typical yield from a 1 2-L fermentation was 300 g of wet cell paste. Cell pellets 

35 were then frozen at -80° C prior to lysis and purification of the recombinant proteins. 

Purification of the Cel E1 and Cel E1/2 Cellulases 

[0110] The E. coli fermentation cell pellets were thawed by resuspending the frozen cells in two volumes of 20 mM 
40 Tris buffer pH 8.0. The cells were homogenized with a Virtis Virtishear 1200 for 20 min., then lysed by one passage 
through a Microfluidizer (Microfluidics Corp.) at a pressure of 9600 psi. The lysate was centrifuged at 43,000 x g for 
30 min. The pellet was discarded and the supernatant was combined with sufficient ammonium sulfate to make a 1 
molar solution. The ammonium sulfate solution was stored overnight at 4° C then centrifuged at 15,000 x g for 20 
minutes. The supernatant was then chromatographed on phenyl sepharose. The column (5x10 cm) was washed with 
45 10 mM Tris pH 8.0, 1 .0 M ammonium sulfate. After the column effluent had an A280 of less than 0.1 AU, the protein 
was eluted with a 300 mL linear gradient from 1 .0 M to 0 M ammonium sulfate. This column eluent was used in the 
application testing. Each of the constructs tested in the application was electrophoresed on a 12% polyacrylamide gel 
and then blotted to an Immobilon membrane and N-terminally sequenced. Figure 16 shows the expected N-terminal 
sequenced versus the sequence found upon Edman degradation. 

so 

Purification of the CelBS and CelB 4/5 Cellulases 

[0111] When the Cel B4/5 protein purification described below is carried out in the presence of a protease inhibitor 
cocktail consisting of phenymethyl sulfonyl fluoride, EDTA and Aprotinin, the full length protein, CelB4/5, consisting of 
55 the CBD, PT linker region and catalytic domain is purified. However, in the absence of the protease cocktail, the linker 
region is cleaved to yield the Cel B5 endoglucanase domain alone, without the CBD or PT linker domains. 
[0112] For purification of the CelB4/5, 280 g of cells expressing celB4/5 were thawed in three volumes of 10 mM 
Tris, pH 7.0 in the presence of the protease cocktail described above. The thawed cells were virtisheared for 20 min. 
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then lysed as before by a single pass on the Microfluidizer The lysate was centrifuged for 10 min. at 3,500 x g. The 
resulting supernatant (820ml) was heated in a 50° C water bath for 10 minutes, then centrifuged for 20 minutes at 
3,000 x g. Sufficient (NH 4 ) 2 S0 4 was added to give a 20% saturated solution, the solution was centrifuged for 30 min. 
at 3,000 x g and the pellet discarded. More (NH 4 ) 2 S0 4 was added to the supernatant until the solution was 35% 
5 saturated, the solution was centrifuged for 30 min. at 3,000 x g and the supernatant discarded. The pellet was resus- 
pended in 10 mM Tris pH 8.0, 0.5 mM EDTA, 1 mM Aprotinin. 

[0113] The solution was chromatographed on a 430 ml DEAE column (5 cm x 20 cm) and eluted with a two-step 
NaCI gradient. Step one of the elution profile was 0 to 1 50 mM NaCI wash in 300 ml, step two was a wash of 1 50 mM 
to 260 mM NaCI linear gradient in 1 200 ml. The CMCase activity eluted between 750-950 ml and gave 1 .5 g of CelB4/5 
10 protein. 

[0114] CelB5 was purified in an identical manner except the only protease inhibitor added to the cell lysate super- 
natant was 1mM PMSF. CelBS eluted in an identical manner from the DEAE column. The total protein purified was 1g 
from about 280 g of cells. 

is Purification of CelB3/4/5 

[0115] 400g of frozen cells are thawed in 800 ml of 10 mM Tris, pH 8.0, 0.5 mM EDTA. The cells are lysed by one 
pass through the Microfluidizer at 12,000 psi. The lysed sample is then centrifuged at 7,800 x g for 50 min. To the 
supernatant (950 ml) is added slowly 100.7 g of ((NH 4 ) 2 S0 4 to give a 20% saturated solution. The solution is stirred 

20 overnight for 1 2h at 4° C . The precipitated proteins were removed by centrifugation for 30 minutes at 1 4,000 x g. The 
remaining supernatant is brought up to 40% (NH 4 ) 2 S0 4 and left to stir for 48 h at 4° C. The precipitate is pelleted by 
centrifugation for 30 min at 15,000 x g. The pellet is resuspended in 20 mM Mes pH 6.0 . The conductivity is reduced 
to less than 3 ohms/cm 2 by diafiltration using a 30kD Filtron membrane. The dialysate is centrifuged to remove any 
precipitate and chromatographed on S-sepharose (10 cm x 6 cm) and eluted with a linear salt gradient from 0.1 M to 

25 0.35M. Fractions containing activity of greater than 200 units/mL are pooled. The final pool contains 720 mg of protein 
which is approximately 52 % pure as determined by densitometry scanning of a Coomassie stained 12% SDS PAGE 
of the pool. 

Purification of CelE3/B5 

[0116] 400 gm of E. coli DE3-B121 are thawed in 10 mM Tris, pH 8.0, 0.5 mM EDTA. The cells are lysed by passage 
through the microfluidizer at 12,000 psi. The precipitate is removed by centrifugation of the lysate for 30 min at 8,000 
x g. To the supernatant is then added solid (NH 4 ) 2 S0 4 to give a 20 % saturated solution. The precipitate is removed 
by centrifugation at 14,000 x g for 30 min and the supernatant was loaded on a phenyl sepharose column 6 cm x 10cm. 

35 The protein is eluted with a 2L reverse linear gradient from 1 M to 0 M (NH 4 ) 2 S0 4 in lysis buffer. The bulk of the activity 
is collected in three fractions. Each of the fraction contains 250 ml. Each of the fraction is analyzed for the activity. 
[01 17] The conductivity is reduced to less than 3 ohms/cm 2 by diafiltration using a 30kD Filtron membrane with a 1 0 
mM Immidazole pH 7.0. The dialysate is chromatographed on S-sepharose (10 cm x 6 cm) and eluted with a linear 
salt gradient from 0 M to 0.23 M. Fractions containing activity of greater than 250 units/mL are pooled. The final pool 

40 contains 720 mg of protein which is approximately 86.9 % pure as determined by densitometry scanning of a Coomassie 
stained 1 2% SDS PAGE of the pool. 

pH rate profiles of purified Cellulases 

45 [0118] The pH rate profiles and thermostability of the cellulases were determined. These data serve to define the 
pH extremes at which an enzyme could be used in an application. Cellulases were assayed at 50° C for the determi- 
nation of the pH rate profiles. The catalyzed rates of reaction at each pH are expressed as fractions of the fastest 
observed rate. This is calculated by dividing the rate of reaction at each pH by the highest reaction rate observed at 
any pH, the highest reaction rate is therefore plotted as 1.0. The CMC substrate and buffer in each case was made 

50 with an appropriate buffer for each pH being tested. The following buffers were employed for each of the assays, at 
pH 3.0 sodium tartrate (25 mM), pH 4.0 sodium tartrate (50 mM), pH 5.0 sodium acetate (50 mM), pH 7.0 sodium 
phosphate (50 mM), pH 9.0 glycine (50 mM), pH 10.0 glycine (50 mM), pH 11.0 CAPS (50 mM), pH 12.0 sodium 
phosphate (50 mM). 2% CMC was made up at each pH in the buffers listed. No more than 10 uJ of enzyme was added 
to the total reaction mixture of 0.5 ml so that the pH of the reaction would not be effected. 

55 

Thermal Stability of Cellulases 

[01 1 9] The thermal stability of these proteins is summarized in Table IV. The addition of CBDs to the catalytic domains 
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has different effects on the thermal stability of the protein constructs. The CelE1 was dramatically stabilized by the 
addition of the cellulose binding domains, there is a 25° C increase in the stability of the CelE1/2 relative to CelE1. 
[0120] Assays to determine the thermostability of the cellulases with time were carried out in one of two ways de- 
pending on the temperature at which the studies were done and the time of incubation. At temperatures of up to 80° 

5 C or if the samples were incubated for less than two minutes then stability studies were done by protocol 1 . An aliquot 
(40 uJ) of the purified cellulase was diluted into an aliquot (200 jiL) of incubation buffer, 50 mM sodium phosphate buffer 
at pH 7.0, that was preheated in an 80° C water bath. At the specified time points aliquots (25 uJ) were withdrawn from 
the diluted sample incubated at the designated temperature and diluted into 475 u.1 of ice cold incubation buffer. Each 
of the time points was then assayed to determine the remaining cellulase activity using the standard CMCase assay. 

10 [0121] Protocol 2 was used when incubations of above 80° C were done for a time in which any assay point exceeded 
two minutes of incubation time. In this case sufficient cellulase for an individual CMCase assay was placed in a tube 
and preheated to 80° C. At time 0 the samples were then transferred to a water bath at a higher temperature for example 
85° or 90°. At the designated time points the samples were withdrawn and placed in an ice water bath. Each of the 
time end points was then assayed to determine the remaining cellulase activity with time using the standard CMCase 

is assay 

Structural characterization of purified cellulases 

[0122] Characterization of the CelBS protein by MALDI-TOF and N-terminal sequencing shows the linker domain is 
20 clipped between T999 and A1000 in the full length CelB protein sequence and that the two C-terminal amino acids 
K1424 and N1425 are also proteolyzed (Figure 15). The N-terminal sequence of the expressed proteins were deter- 
mined using the techniques of Matsudaria (1987) in which proteins were electrophoresed on SDS PAGE, blotted to 
PVDF membranes and then N-terminally sequenced by Edman degradation (Figure 15). 

25 Application Testing of Tok7B.1 Cellulase Constructs 

[0123] The purified enzymes were tested in the denim stone-wash application, under the same conditions that were 
used in the initial evaluation of the cellulase supernatants. Results are shown in Table IV. Cellulase constructs that 
gave a stonewashing effect and showed a dose dependent increase in abrasion with increasing concentrations of 
30 enzyme were lacking a cellulose binding domain. Results demonstrated the CelBS and CelE1 protein constructs gave 
the best stone wash effect. 
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Annex to the description 



5 SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i> APPLICANT 

(A) NAME: CLARIANT FINANCE (BVI) LIMITED 
iq (B) STREET: Citco Building, Wickhams Cay, P.O. Box 662 

(C) CITY: Road Town 

(D) STATE OR PROVINCE: Tortola 

(E) COUNTRY: British Virgin Islands 

(F) POSTAL CODE; 

is (ii) TITLE OF THE INVENTION: Truncated Cellulase Compositions 

(iii) NUMBER OF SEQUENCES: 47 

(iv) COMPUTER- READABLE FORM: 
(A) MEDIUM TYPE: Diskette 

20 (B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 



2S 



(V) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 98810 919.5 

(2) INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 11707 base pairs 
<B) TYPE: nucleic acid 
30 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: 

CCAATCTGTG TCATGTGCTG AAACAGCGGT 
GAAAAATCAA ATACAGAAAT ATATCGCCAG 
GCAAATTTCA AAATGATTCT GGTTTATCTT 
TTGTTGATGA TGTTCGCTCG ATACTTACAT 
ATG AGAT TAG AAACATGGTT GGTGG AG AT T 
CTGGACCACC TGCTGAATAC AAGTGGTATG 
ACAACTCAAA TCTCATACCT CCGTTGCAAA 
AGGGTATTGA T AT GAGTCCT TCTGGAAATG 
CTGAATATAC AGGATTCAAT GTCAATAGCA 
CGAGCCAGCA AACAATAAAT GAAATTACCA 
CAACGCCTGT GCCAACAGCA AATGTAACCA 
ATCGAAGAAT GAGTATAGAG TAAGAAGGTT 
TGAGGAAGAT GAAGAAGAGG GTAATTTCAA 
CGCTTGTAGG TACTTTGATA TTTCATCAGG 
TTGAAGGTGC TG AT ACT TTA TCTTACTTTG 
TGGGCAATGC ATATAATGGT AAAAG TAGTG 
ATGGAGTTGC AGTTGACGTT AAAAACATTA 
CGTATGTAAA ACATAGCTAC CAGAAGCCGG 
ATGGAAGTGG GGTTAAGAGT ACTCTCATAG 
AGAAAATTGT TGGTAAATGG ACTCCAAATA 
TACACACAAT TGTAGAAAGC GAAGTAGATT 
ATAATAGTTA CCTATCAAAT GCAGTGACAT 
AGGGTTGGCA GGCAAGGGGA AGCGGTGTTA 



SEQ ID NO:l: 

TTGCTGTACA CACTCGATGT GCCCACCGGC 60 

CTCTGAGCAG TTTTCCATCT TCTATTCGCA 120 

ACTCGTACGC GTCTTTGTCA GCTCCTGCAA 180 

TCTGGCAGGA TTCGATATTA AGCAAAAAAG 24 0 

GGAAAAAACC TCCTGCAGAG CAGGTTGTTG 300 

CAACTGCTCA AATCAATGAC AGCGATTTTT 360 

GTGGTGACAG TCTCGTACTT ATGACAACAC 420 

TAATTAGAAA TGGTGTTTTT ATTTCACTTG 4 80 

ACGGTGATCT AAAAAT TATA TGGGACAGAC 54 0 

ATGATTTGAA TTTGCCAATT GTTCCAACAC 600 

CGGGTACCAC AAACAATTTC CAAATAATAA 660 

AT AT T TTAAA ATAGTAGTCA AAAAGGGAAG 720 

TTCTTTCTTT ATTGTTTTTT TTAATAAACA 780 

AAGCAAAAGC AGCAGCATAT ACTGTTGATT 8 40 

C T TAT GG AAA ATCGAGCATA GCAGTTGACA 900 

TCAGGGTGTC AAATAGAAGT TCAATATGGG 960 

TGAACAATGG AACCACATGG GTAGTTTCAG 1020 

TTGCATTTGG TATCTCAGCG GTTTACGACG 1080 

GTGAGGTTGT GGCTATTCCA AATTATTGGA 1140 

TTAGCAATGT CAGGAATTTG TTAATTGTAA 1200 

ATAATGTTGA C TAT AT CC AA ATAATGGATG 12 60 

TTTCAAGTGG ATTTGAAAGT GGCACTACCG 1320 

CAGTAAAACC AGATAGCGTT GT GGCAT AT A 1380 
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GTGGCAAGTA TAGTTTGTAC GTCAGTGGAA GAACGTCAAA TTGGCATGGT GCACAGATTC 14 40 

CGGTAGATAC AATTTTGGAA CAGGGTAAAG TGTATAAAAT AAGTGTTTGG GTTTATCAGA 1500 

ACAGTGGTTC AACTCAAAAA ATGTCATTAA CTATGCAAAG AAGATTTGCT ACAGATCCTT 1560 

5 CAACAAGCTA TGAAAATCTG ATATATAACA GGGATGTACC GAGTAATACG TGGGTTGAGC 1620 

TGAGTGGAAG CTACTCAATT CCTGCTGGTG TTACAGTTAG CGAGTTGTTG CTTTATGTTG 1680 

AGGCACAAAA TGCAAATTTG GCTTTCTGGG TTGATGATTT AAAGATTTAT GATTTATCCA 1740 

AGTTGGCTGA ACCTGAATGG GAGATACCAT CTTTGATAGA AAAGTATAGA GATTATTTCA 1800 

AAGTAGGAGT AGCTTTGTCT TACAAAAGCA TTGCCTCTGA TACAGAAAAG AAGATGGTTT 1860 

TGAAGCATTT CAATAGTATT ACTGCAGGGA ACGAAATGAA ACCATCAGAG TTACTTGTCG 1920 

10 AT G AAAATAC TTACAACTTT AGCAAAGCAG ACGAATTTGT AAATTTTGCA ACAAGTAACA 1980 

ACATTGCCAT CAGAGGTCAT ACACTGGTTT GGCATGAGCA AACACCCGAC TGGTTTTTCA 2040 

AGGACACAAA TGGAAATACG TTGAGCAAGG ATGCATTGCT AAGCAGATTA AAACAGTATA 2100 

TTTATACGGT AGTGGGAAGA TATAAAGGGA AGGTTTATGC ATGGGATGTG GTAAAT GAAG 2160 

CAATAGATGA AAGTCAAGGT GATGGATTCA GGAGATCTAA CTGGTACAAC ATTTGTAGTC 2220 

CCGAATATAT TGAGAAGGCT TTTATATGGG CACATGAAGC CGATCCAGAC GCAAAATTGT 2280 

15 TTTACAACGA TTACAACACA GAAAACAGTC AGAAGAGACA GTTTATTTAC AACATGATTA 234 0 

AGAGTCTCAA GGAAAAAGGT GTTCCAATTC ATGGAATAGG ATTGCAGAGT CATATAAATC 2400 

TTGATTGGCC CTCGATTAGC GAGATAGAGA ACACCATAAG ATTGTTCAGC TCTATACCTG 24 60 

GATTGGAGAT ACACATTACG GAGCTTGATA TGAGTTTTTA TCAGTGGGGT TCGAGTACCA 2520 

GTTACTCAAC GCCACCAAGA GATCTCCTGA TAAAACAGGC AATGAGATAT AAGGAG TT AT 2580 

2Q TTGATTTGTT TAAAAAGTAC AACAATGTAA TAACAAGTGT AACATTCTGG GGACTGAAGG 2640 

ATGATTACTC ATGGCTGAGT CAAAACTTTG GAAAAAGTGA TTACCCGTTG TTATTTGATG 2700 

AAAACTATAA ATCAAAATAT GCCTTTTGGA GCCTGATTGA GCCAACTGTG ATACCGGCCA 2760 

ACTCAACATT GCCAGCACCA CCAGCTAT T C AAATACCTAC ACCAACTCCC ACACCAACCC 2820 

CGACACCGAC AGTGAGTGCA ACGCCAACAC CAGCACCGAC GGCATCACCG GTAGGTGGCA 2880 

GTTACTGGAC GCCGAGTGAG AGTTACAGTG CGCTGAAGGT ATGGTATGCG AATGGGAATT 2940 

25 TAAGCAGCCC GACGAATGTA TTGAATCCTA AGATAAAGAT AGAGAATGTT GGGACGACAG 3000 

CGGTAGATCT TAGCAGGGTG AAGGTAAGAT ACTGGTACAC GATAGATGGT GAGGCAACAC 3060 

AGAGTGTAAG TGTAACAAGC AGC AT AG AT C CTGCGTATAT AGATGTGAAG TTTGTGAAGC 3120 

TTGGAGCGAA CGCAGGCGGA GCGGAT T AC T ATGTGGAGAT AGGCTTTAAG AGTGGAGCAG 3180 

GGGTTTTGGC AGCAGGGCAA AGCACGAAGG AGATAAGACT TAGCATACAG AAGGGCAGTG 3240 

GCAGCTACAA TCAGT CAAAT GACTAT TCGG TGAGGAGTGC AACAGGCTAT ATAGAGAACG 3300 

30 AGAAGGTAAC AGGGTATATA GATGATGTAC TTGTATGGGG AAGAGAGCCG AGCAGGAACG 3360 

CCCAGATCAA GGTATGGTAT GCGAATGGGA ATTTAAGCAG CCCGACGAAT GTATTGAATC 3420 

CTAAGATAAA GATAGAGAAT GTTGGGACGA CAGCGGTAGA TCT TAGCAGG GTGAAGGTAA 34 80 

GATACTGGTA CACGATAGAT GGTGAGGCAA CACAGAGTGT AAGTGTAACA AGCAGCATAA 3540 

ACCCTGCGTA TATAGATGTG AAGTTTGTGA AGCTTGGAGC AAATGCAGGT GGAGCGGATT 3600 

35 ACTATGTGGA GATAGGCTTT AAGAGTGGAG CAGGGGTTTT GGCAGCAGGG CAGAGCACGA 3660 

AGGAGATAAG ACT TAGCATA CAGAAGGGCA GTGGCAGCTA CAATCAGTCA AATGACTATT 3720 

CGGTGAGGAG TGCAACAGGC TATATAGAGA ACGAGAAGGT AACGGGGTAT ATAGATGGTG 3780 

CGATAGTGTG GGGAAGAGAG CCGAGCAGGG GTACAAAGCC GGCGGGAGTA GTAACACCGA 3840 

CACCGGCACC GACCCCGACA TCGACGCCGA CACCAACACC TACAACCACA CCTGCACCGA 3900 

CATCAGCCCC GACACCGAGC CCAACAGTGA CAGCAACGCC GACTCCAACG CCGACGCCGA 3960 

40 CAGTGACGGT TACTGTGACT CCGACACCGA CACCAACACC GACGCCGACA CCGACAGGGA 4020 

CACCTGGCAC GGGAAGTGGT TTGAAGGTAC TATACAAGAA CAATGAGACA AGTGCGAGCA 4080 

CAAGTTCTAT AAGGCCGTGG TTTAAGATAG TGAATGGAGG CAGCAGCAGT GTTGATCTTA 4140 

GCAGGGTTAA GATAAGATAC TGGTACACAG TGGATGGTGA CAAGCCACAG AGTGCGGTAT 4 200 

GTGACTGGGC ACAGATAGGG GCAAGCAATG TGACATTCAA TTTTGTGAAG CTGAGCAGCG 4260 

GAGTGAGTGG AGCGGATTAT TACTTGGAGG TAGGATTTAG CAGTGGAGCT GGGCAGTTGC 4320 

45 AGCCTGGTAA GGACACAGGG GATATACAGG TAAGGTTTAA CAAGAATGAC TGGAGCAATT 4380 

ACAATCAGGC AGACGACTGG TCATGGTTGC AGAGCATGAC GAATTATGGA GAGAATGCGA 44 40 

AGGTAACGCT GTATGTAGAT GGTGTTCTGG TATGGGGGCA GGAGCCGGGC GGAGCGACAC 4500 

CTGCACCGAC AAGCACAGCA ACACCAACGC CAACTCCGAC AGCAACAGCA ACACCGACGC 4560 

CGACAGCAAC GCCAACGTCT ACACCGACAC CGACAGCAAC ACCAACCCCA ATACCAACAC 4 620 

so CCACAACGCC TCCTACAAAA CCGGTGGGTA AGATTCCACC AAATAACAAC CCGCTGATTT 4680 

CACACAAGTT CGGTGCGGAC CCGGCAGTCC TTGTTTATGG TGGCAGAGTT TATATGTATC 4740 

TTACAAATGA CATTCTGGAG TATGATGAAA ATGGAAATGT GAAGGATAAC TCATACAGCA 4800 

AAATAAACAA AATAACAGTT ATATCATCGG ATGACCTTGT AAACTGGACA GACCAT GGCG 4860 

AG AT TG AAGT TGCAGGTCCG AACGGGGTTG CAAAATGGGC AAGTCTTTCA TGGGCACCGG 4 920 

CTGTTGCATG CAAAAAGATT AACGGAAAAG ACAGGTTCTT CCTTTACTTT GGCAACAGCG 4 980 

55 GTGGTGGCAT AGGTGTAATA ACGGCAGACT CACCAACCGG TCCGTGGTCA GACCCGCTTG 5040 
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GAAGACCGCT TAT CACATGG TCAACACCCG GTGTGCAGGG TGTTGTCTGG TTGTTTGACC 5100 

CTGCAGTGCT GGTGGATGAT GACGGGAAAG CATATATTTA TTTTGGTGGA GGAGTTCCAC 5160 

AGGGGCAGGA TGCTATGCCA AACACGGCAC GTGTGATGCA GCTGGGAGAT GATATGATAA 5220 

5 GTGTTGTTGG GAGTGCTGTT ACAATTCCAG CACCATACAT GTTTGAGGAT TCCGGGATAA 5280 

ACAAGATAGG GAATACCTAC TATTACTCCT ACTGCACAAA CTTTGCACAA AGACCGCAGG 5340 

GCAGCCCACC GGCGGGTGCT ATAGCGTACA T GACAGGC AG AAGTCCAATG GGACCCTGGG 54 00 

AATACCGCGG GGTTATACTC AGAAATCCGG GGAATTTCTT TGGAGTTGGT GGCAATAACC 54 60 

ATCACCAGCT GTTTGAATTT AATGGCAAAT GGTATATTGC ATACCACGCA CAGACAC TTG 5520 

1Q CAAAAGATTT GGGAGTTGCA AAGGGTTACA GGTCACCGCA TATAAACTAT GTGCAGATTG 5580 

AAAATGGTAC GATAAAAAAA GTAACAGCCG ACTACAAAGG AGTGGCACAG GTGAAGAATT 5640 

TTGACCCGTA CAGGATGGTT GAGGCGGAGA CATTTGCATG GTGTGCAGGG ATTTCGACAA 5700 

AGAAGGCAAA TGCGAGCAAT AATATGTGCT TGACAGGTAT AGACAGTGGA GACTGGATTG 57 60 

CACTTTCCAA GGTTGACTTT GGTAATGCAG GTCCACAGAA ATTTGAGGCG CAGGTTTCCA 5820 

GCATCAACGG CAAAGGGTAT ATAGAACTCA GGATAGACTC GGTTGATGGT AGAACCATTG 5880 

15 CAGT TGCAGA GGTTCTGCCA CAGAGTGGTT CTTCTTCGCA GTGGGTCAAA GTAGAGGCAA 5940 

ATGTTGAGAA TGTAACAGGT GTGCATGATT TGTATCTTGT GTTCAGAGGT GAAAAGAAGA 6000 

GCAACCTGTT TGACATGGAT TGGTGGAGAT TTGTGAGGTA AATAGCATTA GTCAACGCGA 6060 

GATATTAATA CTGCTTTAGC AGTCAGTAAA TGAATGAATA AAGGAATTTT AGCGGGGTAG 6120 

CACATCTATA GGAAAGATGT GCTGCTTCGC TAAAGTCCTA TATATGGGTG TTTCAAAAGT 6180 

AGCACAAAAG ATAATTGGTT TTAACAGTCA AAATGTACAA GTAAAAGTAA ACAAGCAGGA 624 0 

20 GGGGAGTTAG TGAAATGAAA AAGAGAGTTT TAAGGTTTGT TTCCCGGTTA ATATTGGCAG 6300 

TGTTTATTAT GAGCATAAGT TTAGTGGGAT CAAT GAGTT A TTTTCCTGTA AAGACCGAAG 6360 

CTGCACCTGA CTGGAGTATA CCGAGTTTAT GGGAGAGTTA TAAGAATGAT TTTAAGATAG 6420 

GGGTAGCGAT ACCTGCGAGA TGTTTGAGCA ATGATACAGA CAAGCAAATG GTGTTGAAGC 64 80 

ATTTTAACAG TATTACAGCA GAGAATGAGA TGAAGCCTGA AAGTTTATTG GCGGGGCAGA 6540 

CAAGCACGGG ATTGAGTTAC AGGTTTAGCA CAGCTGATAC GTTTGTTAAC TTTGCGAATA 6600 

CGAACAATAT AGGGATTAGA GGGCATACAC TGGTATGGCA TAATGAAACA CCTGATTGGT 6660 

TTTTTAGAGA CAGCAGTGGG CAGATGTTAT CGAAAGATGC ACTGTTAGCG AGGCTGAAGC 6720 

AATACATTTA TGATGTTGTT GGCAGGTATA AGGGTAAGGT ATATGCATGG GACGTTGTAA 6780 

ATGAGGCTAT AGATGAGAGT CAGCCTGATG GATATAGACG TTCGACATGG TATCAAATCT 684 0 

GTGGTCCGGA GTATATAGAG AAGGCATTCA TATGGGCGCA CGAAGCCGAT CCGAATGCGA 6900 

30 AGCTGTTTTA TAAT GACTAT AATACAGAGA TTTCAACAAA GAG AG AT T TC ATATACAACA 6960 

TGGTAAAGAA TTTAAAATCC AAGGGTGTGC CGATTCATGG TATAGGGATG C AG AG C CAT A 7020 

TAAACGTGAA CTGGCCATCG GTGAGTGAGA TAGAGAACAG TATAAAACTG TTTAGTTCGA 7080 

TACCTGGGAT TGAGAT TCAC ATTACAGAGC TTGACATGAG TTTATACAAC TATGGATCAA 7140 

ACGAGAATTA TTCAACACCG CCGCAGGATT TGCTTCAGAG GCAGGCACAG AAGTACAAAG 7200 

AT AT AT T T AC AATGCTGAGG AAATACAAAG GTATTGTAAC ATGTGTTACA TTCTGGGGTT 7260 

35 TGAAGGATGA CTATTCATGG CTGAACTCAT CCAGTAAGAG GGATTGGCCG CTGTTGTTTT 7320 

TTGATGATTA CAGTGCAAAG CCGGCGTATT GGTCGGTGAT TGAGGCAGCA GGTGCAAGTG 7380 

CATCTCCAAG CCCGACAGTG ACAGCAACGC CGACGCCGAC TCCGACGCCG ACAGTGACTG 74 4 0 

TTACGGCGAC TCCGACACCG ACACCAACAG GGACACCTGG TACGGGAAGT GGTTTGAAGG 7500 

TACTATACAA GAACAATGAG ACCAGTGCGA GCACAGGTTC TATAAGGCCG TGGTTTAAGA 7560 

TAGTGAATGG AGGCAGCAGC AGTGTTGATC TTAGCAGGGT TAAGATAAGA TACTGGTACA 7620 

40 CAGTGGATGG TGACAAGCCA CAGAGTGCGG TATGTGACTG GGCACAGATA GGTGCAAGCA 7680 

ATGTGACATT CAATTTTGTG AAGCTGAGCA GCGGAGTGAG TGGAGCGGAT TATTACTTGG 7740 

AGGTAGGATT TAGCAGTGGA GCTGGGCAGT TGCAGCCTGG TAAGGACGCA GGGGATATAC 7800 

AGGTAAGGTT TAACAAGAAT GACT GGAGC A ATTACAATCA GGCAGACGAC TGGTCATGGT 7860 

TGCAGAGCAT GACGGATTAT GGAGAGAATG CGAAGGTGAC GCTGTATGTA GATGGTGTTC 7920 

45 TGGTATGGGG GCAGGAGCCG GGAGGAGCGA CACCTGCACC GACAGCGACA GCAACACCAA 7980 

CGCCAATTCC GACAGCAACA GTAACACCGA CGCCGACAGC AACTCCAACG TCTACACCGA 804 0 

GACCGACAGC GACAGCGACC CCGACACCGA CAGT GAG TG C AACGCCAACA CCGGCACCGA 8100 

CGGCATCACC GGTAGGTGGC AGTTACTGGA CGCCGAGTGA GAGTTACGGT GCGCTGAAGG 8160 

TATGGTATGC GAATGGGAAT TTAAGCAGCC CGACGAATGT ATTGAATCCT AAGATAAAGA 8220 

TAGAGAATGT TGGGACGACA GCGGTAGATC TTAGCAGGGT GAAGGTAAGA TACTGGTACA 8280 

SO CG AT AG AT GG TGAGGCAACA CAGAGTGTAA GTGTAGCGAG CAGCATAAAT CCTGCGTATA B340 

TAGATGTGAA GCTTGGAGCG AACGCAGGCG GAGCGGATTA CTATGTAGAG ATAGGGT TTA 8400 

AGAGTGGAGC AGGTGTTTTG GCAGCAGGGC AGAGCACGAA GGAGATAAGA CTTAGCATAC 84 60 

AGAAGGGCAG TGGCAGCTAC AATCAGTCAA AT GACT ATT C GGTGAGGAGT GCAACAGGCT 8520 

ATATAGAGAA CGAGAAGGTA ACGGGGTATA TAGATGATGT ACTTGTATGG GGGAGAGAGC 8580 

CGAGCAG GAA CGCCCAGATC AAGGTATGGT ATGCGAATGG GAATTTAAGC AGCCCGACGA 8640 

55 ATGTATTGAA TCCTAAGATA AAAATAGAGA ATGXTGGGAC GACAGCGGTA GATCTTAGCA 8700 
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GGGTGAAGGT 
CAAGCAGCAT 
GCGGAGCGGA 
GGCAGAGCAC 
CAAATGACTA 
ATATAGATGG 
TAGTAACACC 
CACCGACACC 
TTTCATCATC 
CACCAGCGCC 
ATTGGTTATT 
CAGGAGTTAA 
GTAATCTTAA 
CGATTTCAGC 
ATTATTATGT 
TAAAAACATG 
ATGCGATGGG 
ATAAAGCATG 
ATTTGAAGAA 
ATTCAACAGA 
CAAAAAATCC 
TTACGTGGAC 
GGGGTGTTAA 
CACATGATTA 
ATACGCTTTA 
CTCCGTTGCT 
TGACTTATTT 
ATGCAAATTC 
AGAAGTACAA 
TTGATCACAA 
AGAACGGTGA 
AATGTGATGG 
CTATAATAAA 
CTAAATAGAA 
AGTGCTTAAA 
AATGGATGAG 
ATCACACGCA 
ATTTGATTAT 
ACATAAGTTT 
GTCAACTTGG 
TAGCAGTTAT 
CTGAAGCAAT 
AAAAAAATGA 
AATGTATTCT 
TCATACATGG 
TATACAAACA 
GAGTTTGTTG 
GATTTGAGAA 
ATTTTGCTTT 
ATAGTTTTTA 
TTCAATT 



AAGATACTGG 
AAATCCTGCG 
TTACTATGTA 
GAAGGAGATA 
TTCGATAAGA 
TGCGATAGTG 
GACACCGGCA 
GACACCGACA 
CACTCCTACA 
AACTGCAACA 
TGCGCAGGGT 
TTGGTTTGGA 
AAGTGCATTA 
AGAGCTGATT 
TAACCCTGAG 
CAAAGAAGTT 
GCATATATAT 
TGAATGGATC 
TGAGCCACAT 
TATTAACAAC 
AAACATGTTA 
TTCTAAATCA 
AAAGTATCCA 
TGGACCATTG 
CAATGATTGC 
CATTGGTGAA 
GAGAGATTAT 
TGGTGATACT 
TTTCTTAAAA 
GAGACCACTG 
AAAACCGCCT 
AGGCTACTCA 
TATTATTGTG 
ACTGTTTTAT 
TCTAATTATA 
AATGAAAGTG 
GAAGGTAAAA 
AAGAAAGAGA 
TATAGGCAAA 
AAGTTTAACA 
GAGATTGTTT 
ATGGGCAGAA 
CAGATTCCAT 
ATGTGCCACC 
ATAAGAGCAG 
AAGCAATAGA 
ATACAAAATT 
GCATCTATTT 
TGATGTTCAT 
TTGATTACTA 



TACACGATAG 
TATATAGATG 
GAGATAGGGT 
AGGCTTAGCA 
AGTGCGAATA 
TGGGGAAGAG 
CCGACCCCGA 
CCGACTGTGA 
CCAACAGCAA 
CCCACTCCGA 
AACAAAATAG 
TTTAATACAG 
GCTGAGATTG 
TTGAATTGGT 
TTAGAAGGTC 
GGACTGAAAA 
CCGGTATGGT 
AGAGAGAGAT 
GGTAAACGAT 
TGGAAATATG 
ATAGTAATTG 
TCAAGTGACT 
ATAAACCTTG 
GTTTACCAGC 
TGGAGGGATA 
TGGGGTGGTT 
ATTATAGAAA 
GGAGGATTGG 
CCAGCTTTAT 
GGTACAAATG 
GTCCCAAAGA 
AAGTTGATT T 
GGGAGAAAGG 
GACAAAAGAA 
TCACTGGACT 
AAATAGAACT 
GACATACCTA 
AGATATGTCT 
TCTCAAAGGA 
AGTATGAAAG 
GATAAAGTAA 
TAGTTTAAAG 
GCAAAAGTTC 
CGGGGCTTTT 
AAAAATAACA 
TATTGTTGAC 
CACTGAATTT 
TAAAGATTTT 
AATTTCAAGA 
CCTTGCACTT 



ATGGTGAGGC 
TGAAGTTTGT 
TTAAGAGTGG 
TACAGAAGGG 
GCTATATAGA 
AGCCGAGCAG 
CATCGACGCC 
CGGTGACCCC 
CGCCAACACC 
CTCCTTCTGT 
TCGACAAGGA 
GAACGAATGT 
CAAACAGAGG 
CGAAAGGAAT 
TGACGAGTTT 
TTATGTTGGA 
ATACAGATAC 
ATAAAAATGA 
GGCAAGATAG 
CAGCTGAGAC 
AAGGAATAGA 
ATTATTCTAC 
GACAGTATCA 
AACCCTGGTT 
ATTGGACTTA 
ACTTAGATGG 
ACCATATTCA 
TGGGATATGA 
GGCAGGATAG 
GGAAGAATAT 
ATTAATAAAT 
GGTAGCCTCT 
GAAAAATATA 
CACTCAAAAA 
TTTAAAATCG 
TCACATAAGT 
GACATACCTA 
GCAAAGCATG 
TGACAAATAG 
AGATAGCAAA 
ACCCTGGTCA 
GCAATGCTTT 
AAAGATTATT 
GAAAGCAGTA 
AAAATTATCC 
AGTGTTGAAA 
GGTGTTGGCG 
AGGACTTTGA 
TCAATAT TCA 
TCTATTACAG 



AACACAGAGT 
GAAGCTTGGA 
AGCAGGTGTT 
CAGTGGCAGC 
GAACGAGAAG 
GGGTACAAAG 
AACACCGATA 
AACTTCTACA 
TACACCTTCT 
CACAGATGAT 
TGGCAAACCT 
GTTTGATGGT 
ATTTAATTTG 
TTATCCAAAA 
AGAGGTATTT 
TATTCATAGT 
TATAACGCCA 
TGATACAATT 
TGTTTTTGCA 
CTGTGCGAAG 
AGCTTATCCA 
CTGGTGGGGC 
GAACAAAGTG 
TTATCCTGGA 
TATTATGGAT 
TGGCGATAAT 
TCATACATTC 
TTTTTCGACG 
TAAAGGAAGA 
AAATATAACT 
GGATGAATAC 
ATCTTTTTAA 
GCACTTATTC 
GAAAGGAGGC 
AAAGATATCA 
TACAAGGAAG 
TAATAATGGG 
TGGGAAGAAG 
ACTTGCAGCA 
ACACACAAAT 
AACTCTAGAT 
TGCAAATTTC 
TTGAAGGATA 
ATTTTGAAAA 
TTATCCTTGA 
AGGTATTGAA 
GAATATCGTC 
GACTCATAAT 
ATGCGGCAGC 
AGATGATTTT 



GTAAGTGTAA 

GCAAATGCAG 

TTGGCAGCAG 

TACAATCAGT 

GTAACAGGGT 

CCGGCGGGAG 

CCTACAACCA 

CCCACACCGG 

ATCACGATAA 

ACAAATGATG 

GTATGGTTAA 

GTGTGGAGTT 

CTAAGAGTAC 

CCAAATATCA 

GATTTTGTAG 

GCAAAAACTG 

GAAGATTATT 

GTAGCATTTG 

AAATGGGACA 

AGAATACTTG 

AAAGATGATG 

GGCAACTTAC 

GTTTATTCAC 

TTTACCAAAG 

AATGGGATAG 

GAAAAGTGGA 

TGGTGTTACA 

TGGGATGAAC 

TTTGTTGGGC 

ATTTATTACC 

TTCTTTTGTA 

AAAAGTGGCT 

TGAATGCCTG 

ATCCAGAATA 

TTCTTCTTCA 

GTACATGATT 

CAAGAAAACA 

TTTTTTGAAC 

TATATTATAA 

GTTTCAGATG 

GAGTTTTCTT 

TGATGCACTT 

CAAAAACACA 

AGCACTGGAT 

CACCAATCCT 

AAACTCTTTG 

AAGCAATCAC 

GATAATTAGC 

GGTTGTGGCA 

CAAAGGTATT 



8760 
8820 
8880 
8940 
9000 
9060 
9120 
9180 
9240 
9300 
9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 
10500 
10560 
10620 
10680 
10740 
10800 
10860 
10920 
10980 
11040 
11100 
11160 
11220 
11280 
11340 
11400 
11460 
11520 
115B0 
11640 
11700 
11707 



50 



ss 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6416 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GTCGACACTT GACTGRRGCG GGCAGCCGGA TACATGGAAT GGGACATATA CGGGCAATCC 60 
5 AAATCTGCAT GTGAAGATAG TGGATTATGG AACAGATTTG GGTATAACTG CATCACTTGC 120 

GAATGCGCTT TTGTACTACA GTGCGGCGAC GAANGAGTAT GGAGTATCTG ATGAGGCAGC 180 
GAANAATTTA GCGAAAGAGC TGCTGGACAG GATGTGGAAC TTATACAGGG ATGACAAGGG 240 
CTTGTCGGCA CCCGAGAAGA GAGGAGATTA CAAGAGGTTC TTTGAGCAAG AGGTATACAT 300 
TCCAGCGGGC TGGACAGGGA AGATGCCGAA TGGAGATGTA ATAAAGAGTG GAGTGAAGTT 360 
TATAGACATA AGGAGCAAGT ACAAACAGGA TCCTGACTGG CAGAAGCTGG TTTCGGCATA 420 

10 CAATGCAGGA GAGGCACCGG AGTTCAGGTA TCACAGATTC TGGGCACAGT GTGATATAGC 4 80 

AATTGCCAAT GCAACATATG AAATCCTGTT CGGCAATCAG TAAGTCAAAA GTGGGTGTGT 540 
GAAAGATATT AGGAAGGGAA GTAGCACCGC TCTGTGCTAC TTCCCCAATT TGAAAAGTTA 600 
AATAAAAACA AAGTTAATTA AGAGAGGGGT AGGATGCAAG AAATGAAAGC AATTAAGAGG 660 
GTTGTCTCGA TAACTGCTCT ACTTGTTTTG ACACTTTCAT TATGTTTTCC TGGTATCATG 720 
CCTGTGAAAG CTTATGCAGG GGGAACATAT AATTACGGTG AGGCACTACA GAAAACAATA 780 

15 ATGTTCTATG AATT CCAGAT GTCAGGGAAA CTACCTTCCT GGGTAAGGAA CAAC TGGAGA 840 

GGTGACTCTG GCTTAGATGA TGGCAAGGAT GTAGGGCTTG ATTTAACAGG TGGCTGGCAT 900 
GACGCAGGTG ATCACGTAAA GTTTAACCTG CCAATGTCGT ATAGCGCCTC AATGCTGGGG 960 

TGGGCTGTTT ATGAATATAA GGATGCATTT GTAAAGAGCA AACAATTGGA GCACATT TTA 1020 

AATCAAATAG AGTGGGCAAA T GACTACT TT GTGAAGTGTC ATCCATCAAA ATATG TAT AC 1080 

2Q TATTATCAGG TTGGTGATCC AACTGTAGAT CACAATTTTT GGGGACCTGC AGAAGTAATG 1140 

CAAATGAAAC GTCCAGCGTA TAAGTGTGAT TTATCAAACC CAGCATCTTC TGTAGTGGCA 1200 

GAAACAGCTG CATCACTTGC GGTGGCTTCA GTTGTAATAA AGGAAAGAAA TTCTCAGAAA 1260 

GCAGCTTCTT ATCTCCAACA TGCCAAAGAC CTGTTTGAAT TTGCCGATAC CACAAGAAGT 1320 

GATGCGGGGT ATACTGCTGC AACAGGTTTC TACACATCGG GTGGTTTTAT TGATGACCTT 1380 

GGATGGGCTG CTGTATGGCT TTATATTGCG ACAAATGACA GTAGTTATTT GACGAAAGCT 14 40 

25 GAAGAGTTGA TGTCAGAATA TGCTAATGGT ACTAATACAT GGACACAATG CTGGGATGAT 1500 

GTTCGATATG GAACATTGAT CATGCTTGCA AAGATTACAG GGAAAGAGTT ATATAAAGGA 1560 

GCTGTGGAAA GAAACTTAGA CCATTGGACT GACAGAATTA CGTATACGCC GAAAGGGATG 1620 

GCATATCTGA CAGGATGGGG TTCATTAAGA TATGCGACAA CAGCTGCATT TTTAGCATGT 1680 

GTCTATGCAG ACTGGTCAGG GTGCGATTCG AACAAAAAGA CCAAATATTT GAACTTTGCA 1740 

AAAAGCCAGA TTGACTATGC ACTGGGTTCC ACAGGTAGAA GTTTTGTAGT AGGATTTGGC 1800 

30 ACCAATTATC CACAACATCC GCATCACAGG AATGCGCATA GTTCATGGGC TAACAGCATG 1860 

AAAATACCAG AGTATCACAG ACACAT AT T A TATGGAGCAC TGGTTGGTGG TCCTGGTAGT 1920 

GATGATAGTT ATAATGATGA CATTACCGAT TAT GT AC AAA ATGAGGTTGC CTGCGATTAT 1980 

AATGCTGGAA TTGTTGGTGC ACTGGCAAAG ATGTACCAGT TATATGGAGG TGAACCTATT 2040 

GATGATTTTA AAGCAATTGA AACACCCACA AATGATGAAA TTTTTGTTGA ATCAAAATTT 2100 

GGGAATTCAC AGGGTCCAAA TTATACCGAA GTAATTTCCT ATATCTATAA TCGAACAGGA 2160 

TGGCCACCAA GGGTAACTGA TAAACTAAGT TTTAAATATT TTATAGACCT AACCGAATTA 2220 

ATCCAGGCAG GGTATTCGCC TGATGTTGTC AAAGTTGACA CATACTACAT CGAAGGAGGT 2280 

AAAATTAGCG GTCCTTACGT AT GGGAC AAA AATAGGAATA TATACTATGT TCTTGTGGAT 2340 

TTTAGTGGAA CCAAGATATA TCCTGGCGGT GAAGTTGAAC ACAAAAAGCA GGCTCAATTT 24 00 

AAAATATCTG TTCCGCAGGG GTATCCATGG GATCCTACCA ATGATCCTTC ATATAAGGGA 2460 

40 TTAACCAGTC AATTAGAAAA GAATAAATAT ATTGCCGCAT ATGATAATAA TAATCTGGTA 2520 

TGGGGTTTAG AGCCGGGTGC GGCAACATCC ACACCTGCAC CAACATCAAC ACCAACACCA 2580 

ACCCCGACCC CAACACCAAC AGTGACAGCA ACGCCGACGC CGACTCCTAC ACCGACACCG 2640 

ACGGGGTCAC CTGGTACGGG AAGTGGTGTG AAGGTACTGT ACAAGAACAA TGAGACAAGT 2700 

GCGAGCACAG GTTCTATAAG GCCGTGGTTT AAGATAGTGA ATGGAGGCAG CAGCAGTGTT 27 60 

GATC TTAGCA GGGTTAAGAT AAGATAC TGG TACACAGTGG ATGGTGACAA GCCACAGAGT 2820 

45 GCGGTATGTG ACTGGGCACA GATAGGGGCA AGCAATGTGA CATTCAATTT TGTGAAGCTT 2880 

AGCAGCGGAG TGAGTGGAGC GGAT TATTAC CTGGAGGTAG GATTTAGCAG TGGAGCTGGG 2940 

CAGTTGCAGC CTGGTAAGGA CACAGGGGAT ATACAGGTAA GGTTTAACAA GAATGACTGG 3000 

AGCAATTACA ATCAGGCAGA CGACTGGTCA TGGTTGCAGA GCATGACGAA TTATGGAGAG 30 60 

AATGCGAAGG TGACGCTGTA TGTAGATGGT GTTCTGGTAT GGGGGCAGGA GCCGGGAGGA 3120 

GCGACACCTG CACCGACAAG CACAGCAACA CCAACGCCAA CTCCGACAGC AACCCCAACA 3180 

CCTACACCTA CACCGACCCC GACACCGACA GTGAGTGCAA CGCCAACACC GGCACCGACG 3240 

GCATCACCGG TAGGTGGCAG TTACTGGACG CCGAGTGAGA GTTACGGTGC GCTGAAGGTA 3300 

TGGTATGCGA ATGGGAATTT AAGCAGCCCG ACGAATGTAT TGAATCCTAA GATAAAGATA 3360 

GAGAATGTTG GGACGACAGC GGTAGAT CTT AGCAGGGTGA AGGTAAGATA CTGGTACACG 3420 

ATAGATGGTG AGGCGACACA GAGT GTAAGT GTAGCGAGCA GCATAAATCC TGCGTATATA 34 80 

55 GATGTGAAGT TTGTGAAGCT TGGAGCGAAC GCAGGCGGAG CGGATTACTA TGTGGAGATA 3540 
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GGCTTTAAGA GTGGAGCAGG TGTTTTGGCA GCAGGGCAGA GCACGAAGGA GATAAGGCTT 3600 

AGCATACAGA AGGGCAGTGG CAGCTACAAT CAGTCAAATG ACTATTCGGT AAGGAGTGCG 3660 

AATAGCTATA TAGAGAACGA GAAGGTAACA GGGTATATAG ATGATGTACT TGTATGGGGA 3720 

5 AGAGAGCCGG GCAGGAACGC CCAGATCAAG GTATGGTATG CGAATGGGAA TTTAGGCAGC 3780 

ATGACGAATG TATTGAATCC TAAGATAAAG ATAGAGAATG TTGGGACGAC AGCGGTAGAT 3840 

CTTAGCAGGG TGAAGGTAAG ATACTGGTAC ACGATAGATG GTGAGGCGAC ACAGAGTGTA 3900 

AGTGTAACAA GC AG CAT AAA TCCTGCGTAT ATAGATGTGA AGTTTGTGAA GCTTGGAGCA 3960 

AATGCAGGTG GAGCGGATTA CTATGTGGAG ATAGGGTTTA AGAGTGGAGC AGGTGTTTTG 4020 

GCAGCAGGGC AGAGCACGAA GGAGATAAGG CTTAGCATAC AGAAGGGCAG TGGCAGCTAC 4080 

10 AATCAGTCAA ATGACTATTC GGTAAGAAGT GCGACAGGCT ATATAGAGAA CGAGAAGGTA 4140 

ACAGGGTATA TAGATGGTGC GATAGTGTGG GGAAGAGAGC CGAGCAGGGG TACAAAGCCG 4200 

GCGGGAGGAG TGACACCGAC ACCGGCACCG ACGCCGACAT CGACGCCAAC ACCAACACCT 4260 

ACAACCACAC CGACACCGAC ACCGACTGTG ACGGTGACCC CAACTCCTAC ACCTGCGGTA 4320 

ACCCCCGATG TTAAAATATC GATCGATACG TCCAGGGGAA GAACAAAGAT AAGCCCGTAT 4380 

75 ATTTATGGAG CAAATCAGGA TATCCAGGGT GTTGTTCACC CTGCAAGACG ACTTGGTGGG 44 4 0 

AACAGATTGA CGGGTTACAA TTGGGAGAAC AATATGTCCA ATGCAGGGAG TGACTGGTAT 4500 

CAT TCAAGCG ATGATTATAT GTGTTATATT ATGGGTATAA CAGGGAATGA TAAGAACGTT 45 60 

CCAGCAGCTG TTGTAAGCAA ATTTCACGAG CAGTCAATAA AGCAAAATGC ATATTCAGCC 4 620 

ATCACATTAC AGATGGTAGG TTATGTGGCA AAGGATGGGA ATGG TACAGT GAGCGAGTCA 4 680 

GAGACAGCTC CGTCGCCGAG ATGGGCTGAG GTCAAGTTTA AAAAAGATGG TGCACTGTCA 47 4 0 

20 TTGCAGCCTG ACGTGAATGA TAACTATGTA TATATGGATG AGTT TATTAA CTATCTGATT 4800 

AATAAGTATG GTCGATCATC GTCTGCAACG GGAATTAAAG GATATATACT TGACAACGAG 48 60 

CCGGACT TAT GGTTTACTAC TCATCCGCGA ATTCATCCAC AGAAGGTAAC CTGCAGTGAA 4920 

TTGATAAATA AATCGGT GGA GCTGGCGAAA GTAATAAAGA CACTTGATCC AGATGCAGAA 4 980 

ATTTTTGGAC CTGCATCGTA TGGTTTTGTG GGATATTTAA CAT T GGAGGA TGCACCTGAC 504 0 

TGGAATCAGG TTAAAGGAAA TCACAGATGG TTTTTGAGCT GGTACCTTGA GCAGATGAAG 5100 

25 AAAGCAT CGG ATAGTTTTGG GAARAGGTTA TTGGATGTAC TTGACATACA CTGGTACCCG 5160 

GAGGCGCAGG TTGGCGGTGT GCGAATATGC TTTGACGGTG AAAATAGTAC TTCAAGGGAT 5220 

GTGGCAATAG CGAGGATGCA GGCACCGAGA ACGCTATGGG ATCCGACATA TAAAACCACC 5280 

CAGAAAGGTC AGATAACAGC GGGAGAAAAT AGCTGGATAA ACCAATGGTT TCCAGAGTAT 534 0 

CTTCCACTGC TTCCCAATAT AAAGGCAGAT ATAGACAAGT ATTATCCTGG TACCAAACTT 5400 

3Q GCTATAACTG AGTTTGATTA TGGAGGGAAG GACCATATAT CGGGAGGAAT AGCTTTAGCA 54 60 

GATGTGTTAG GGATATTCGG CAAGTATGGA GTATACATGG CAGCAAGATG GGGAGATTCG 5520 

GGGAGCTATG CACAGGCGGC GTACAACATT TATCTCAACT ATGATGGGAA AGGTTCGAGA 55B0 

TACGGTTCAA CGTGTGTGAG CGCTGAGACA ACTGACGTTG AGAACATGCC GGTATATGCT 5640 

TCAATT GAGG GAGAAGATGA TTCGACTGTG CATATTATAT TAATTAACAG GAATTATGAC 5700 

AGGAAACTGA AGGCAGAGAT AAAGATGAAT AATACCAGGG TATACACAGG TGGAGAGATA 5760 

35 TACGGATTTG ACAGTACAAG CTCTCAGATC AGGAAGATGG GAGTGCTCAG TAATATACAA 5820 

AACAACACAA TCACCATAGA AGTTCCAAAT CTGACGGTAT ACCATATTGT TTTAACT TCT 58 BO 

TCAAAGTAGA TTAAAGAATA AAAATGGAGA CACTGCTGCA TGGTAAAAGT TGAGATGTGC 5940 

AGCAGTGTCT CATAATCACT AATCTAATAC AGTTAGAGAT GTTAAATTAT AAAACAGACG 6000 

ATAACTTTGT TTTAAATGAT TGNNAGTCGG ANTTCTNNTG ATTAAAACAT NAGAAANTTG 6060 

TNATANTNGA CTTTAATTNT NGCNNATAAA CGTAAATGGA TTCAATNACN WTACRATTTN 6120 

40 CRTAATCTAW AAGRAGCACA GAGAAAT AT T ACATAGGAGG ATGTATCAAT AAATGATAGA 6180 

TAAAAAGATA ATTGCTGTTA CAATTTTART AATGGTAACA TACT TTT TAG TACAAATATC 6240 

RACTATAGGT GCACGGAATA TACCAGAGAC ATANTGGATA CCGCTGGATA TAGATACAAT 6300 

AAGTATTGAC CTGGGCWAGN AGCCATATGT GANAGAATTT ATAGTATATT TTGGATATGG 6360 

CGGAGGCAAA ATAGASTGTC WGT TTTAT AG AGACAATACT TTGGCATTMT ACATCA 6416 

45 (2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 
so (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
55 CCTTTATGAA TTCATTTACT GACTGCTA 28 
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25 



30 



35 



45 



SO 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CTTCCCTCGA GAATTCACAC ACCCACTTTT G 31 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TACCCCTCGA GAATTCCTAT TT ACT CAT TA 30 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CTACACCCAT GGTAACCCCC GATGTTAA 28 
(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
AAATGCTCGA GTAAAAGTGA ACAAGCA 27 
(2) INFORMATION FOR SEQ ID NO: 8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
SS (D) TOPOLOGY: linear 



26 



EP0 921 188 A2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8 
ATG TGTCCAT GGCATTAATT ATTTTTGTTG 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
ATGCAAGGCA TGCAAGCAAT TAAGAGGGTT G 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 
TCAACAAAGA TCTAATCATT TGTGGGTGTT TC 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
GTGCAGCTCG AGCTCCTCCC GGCTCCTGCC CCCA 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 

GAGGAACGGT CATATGAAGG TATGGTATGC GAATGGGAA 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 38 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GAGGAGGAGC ATGCAGATCA AGGTATGGTA TGCGAATG 38 
10 (2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

1S (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
20 TTTAGCATGC TGAGGAAATA CAAAG 25 

(2) INFORMATION FOR SEQ ID NO: 15: 



25 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

AGTTAGTGGC ATGCAAAAGA GAGTTTTAAG G 31 
(2) INFORMATION FOR SEQ ID NO: 16: 

35 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GAAGTATGGA TCCATTTATT AATTCTTTGG G 31 
45 (2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
50 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
55 TACAATTTTA GCCATGGTAA CATACTTTTT AG 32 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
GCAGCAGTGT CGACATTTTT ATTCTTTAAT CTAC 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
GTGGATGAGA TCTAACCCGG CTCTAAACCC CA 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
TTGAACTTCC CCATGGCAGA ATTTTTACAA ATTGG 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

TGTATCCCAT GCCGTCTT 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 26 base pairs 
(BJ TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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10 



15 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 22: 
CAAAAAGCAA TTATGTTTTA TGAATT 26 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

TGGTGCTGGC AATGTTGAGT TGGC 24 

(2) INFORMATION FOR SEQ ID NO; 24: 

(i) SEQUENCE CHARACTERISTICS : 
20 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

TCGGTAGTGC CACTTTCAAA TCCA 24 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
CAAAGCAGAC GAATCTGTGC GTGGTATGCA ATATAC 36 
40 (2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
45 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
50 AGCTGAGCAG CGGAGTGA 18 

(2) INFORMATION FOR SEQ ID NO: 27: 



30 



35 



55 



(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 18 base pairs 
(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 

TCCACTCACT CCGCTGCT 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
GTTCTGATAC TGTCCAAG 

(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
ACAGGCGGCG TACAACAT 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
TTGAGGGATA TGGTGACC 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
GAGAAACATA TCCTGCAA 

(2) INFORMATION FOR SEQ ID NO: 32: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 32 

CCCATTTTAT ACCCAGGC 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
TCTTGAGCAG CCATTGGA 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
GATGGCCAGT TCACGTTTAT ATGG 

(2) INFORMATION FOR SEQ ID NO: 35; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
AGCACTGGTT GGTGGTCCTG GTAG 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
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GATTGACGGG TTACAATTGG GAGAAC 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
AGWGCACCNA CAAATCCGGC ATTGTARTC 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
CTCCAGAATG TCATTTGTAA GATACAT 

<2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
GGGAATTCCA TATGGCGGCG TATAATTACG GTGAG 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 

TAT TAT TAT C ATATGCGGC 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
CCAGAGTATC ACAGACAC 18 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Other 
20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

CCTGGATCCC TACGCTCCTC CCGGCTC 27 
(2) INFORMATION FOR SEQ ID NO: 43: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 142 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 





(xi) SEQUENCE 


DESCRIPTION: 


: SEQ ID 


NO: 


43: 










Met 


Lys 


Lys 


Arg 


Val 


Leu 


Arg 


Phe 


Val 


Ser 


Arg 


Leu 


He 


Leu 


Ala 


Val 


1 








5 










10 










15 




Phe 


He 


Met 


Ser 


He 


Ser 


Leu 


Val 


Gly 


Ser 


Met 


Ser 


Tyr 


Phe 


Pro 


Val 








20 










25 










30 






Lys 


Thr 


Glu 


Ala 


Ala 


Pro 


Asp 


Trp 


Ser 


He 


Pro 


Ser 


Leu 


Trp 


Glu 


Ser 






35 










40 










45 








Tyr 


Lys 


Asn 


Asp 


Phe 


Lys 


He 


Gly 


Val 


Ala 


He 


Pro 


Ala 


Arg 


Cys 


Leu 




50 










55 










60 










Ser 


Asn 


Asp 


Thr 


Asp 


Lys 


Gin 


Met 


Val 


Leu 


Lys 


His 


Phe 


Asn 


Ser 


He 


65 










70 










75 










80 


Thr 


Ala 


Glu 


Asn 


Glu 


Met 


Lys 


Pro 


Glu 


Ser 


Leu 


Leu 


Ala 


Gly Gin 


Thr 










85 










90 










95 




Ser 


Thr 


Gly 


Leu 


Ser 


Tyr 


Arg 


Phe 


Ser 


Thr 


Ala 


Asp 


Thr 


Phe 


Val 


Asn 








100 










105 










110 






Phe 


Ala 


Asn 


Thr 


Asn 


Asn 


He 


Gly 


He 


Arg 


Gly 


His 


Thr 


Leu 


Val 


Trp 






115 










120 










125 








His 


Asn 


Gin 


Thr 


Pro 


Asp 


Trp 


Phe 


Phe 


Arg 


Asp 


Ser 


Ser 


Gly Gin 


Met 




130 










135 










140 










Leu 


Ser 


Lys 


Asp 


Ala 


Leu 


Leu 


Ala 


Arg 


Leu 


Lys 


Gin 


Tyr 


He 


Tyr 


Asp 


145 










150 










155 










160 


Val 


Val 


Gly 


Arg 


Tyr 


Lys 


Gly 


Lys 


Val 


Tyr 


Ala 


Trp 


Asp 


Val 


Val 


Asn 










165 










170 










175 




Glu 


Ala 


He 


Asp 


Glu 


Ser 


Gin 


Pro 


Asp 


Gly 


Tyr 


Arg 


Arg 


Ser 


Thr 


Trp 








180 










185 










190 






Tyr 


Gin 


He 


Cys 


Gly 


Pro 


Glu 


Tyr 


He 


Glu 


Lys 


Ala 


Phe 


He 


Trp 


Ala 
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195 










200 


His 


Glu 


Ala 


Asp 


Pro 


Asn 


Ala 


Lys 




210 










215 




Glu 


lie 


Ser 


Thr 


Lys 


Arg 


Asp 


Phe 


225 










230 






Lys 


Ser 


Lys 


Gly 


Val 


Pro 


He 


His 










245 








Asn 


Val 


Asn 


Trp 


Pro 


Ser 


Val 


Ser 








260 










Phe 


Ser 


Ser 


lie 


Pro 


Gly 


He 


Glu 






275 










280 


Ser 


Leu 


Tyr 


Asn 


Tyr 


Gly 


Ser 


Asn 




290 










295 




Asp 


Leu 


Leu 


Gin 


Arg 


Gin 


Ala 


Gin 


305 










310 






Leu Arg 


Lys 


Tyr 


Lys 


Gly 


He 


Val 










325 








Lys 


Asp 


Asp 


Tyr 


Ser 


Trp 


Leu 


Asn 








340 










Leu 


Leu 


Phe 


Phe 


Asp 


Asp 


Tyr 


Ser 






355 










360 


lie 


Glu 


Ala 


Ala 


Gly 


Ala 


Ser 


AXa 




370 










375 




Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


385 










390 






Thr 


Pro 


Thr 


Pro 


Thr 


Gly 


Thr 


Pro 










405 








Leu 


Tyr 


Lys 


Asn 


Asn 


Glu 


Thr 


Ser 








420 










Trp 


Phe 


Lys 


He 


Val 


Asn 


Gly 


Gly 






4 35 










440 


Val 


Lys 


He 


Arg 


Tyr 


Trp 


Tyr 


Thr 




450 










455 




Ala 


Val 


Cys 


Asp 


Trp 


Ala 


Gin 


He 


4 65 










470 






Phe 


Val 


Lys 


Leu 


Ser 


Ser 


Gly 


Val 










485 








Val 


Gly 


Phe 


Ser 


Ser 


Gly 


Ala 


Gly 








500 










Gly Asp 


He 


Gin 


Val 


Arg 


Phe 


Asn 






515 










520 


Gin 


Ala 


Asp 


Asp 


Trp 


Ser 


Trp 


Leu 




530 










535 




Asn 


Ala 


Lys 


val 


Thr 


Leu 


Tyr 


Val 


545 










550 






Glu 


Pro 


Gly 


Gly 


Ala 


Thr 


Pro 


Ala 










565 








Pro 


He 


Pro 


Thr 


Ala 


Thr 


Val 


Thr 








580 










Ser 


Thr 


Pro 


Arg 


Pro 


Thr 


Ala 


Thr 






595 










600 


Ala 


Thr 


Pro 


Thr 


Pro 


Ala 


Pro 


Thr 




610 










615 




Trp 


Thr 


Pro 


Ser 


Glu 


Ser 


Tyr 


Gly 


625 










630 






Gly Asn 


Leu 


Ser 


Ser 


Pro 


Thr 


Asn 










645 








Glu 


Asn 


Val 


Gly 


Thr 


Thr 


Ala 


Val 








660 










Tyr 


Trp 


Tyr 


Thr 


He 


Asp 


Gly 


Glu 






675 










680 



205 



Leu 


Phe 


Tyr 


Asn 


Asp 


Tyr 


Asn 


Thr 








220 










He 


Tyr Asn 


Met 


Val 


Lys 


Asn 


Leu 






235 










240 


Gly 


He 


Gly 


Met 


Gin 


Ser 


His 


He 




250 










255 




Glu 


He 


Glu 


Asn 


Ser 


He 


Lys 


Leu 


2 65 










270 






He 


His 


He 


Thr 


Glu 


Leu 


Asp 


Met 










285 








Glu 


Asn Tyr 


Ser 


Thr 


Pro 


Pro 


Gin 








300 










Lys 


Tyr 


Lys 


Asp 


He 


Phe 


Thr 


Met 






315 










320 


Thr 


Cys 


Val 


Thr 


Phe 


Trp 


Gly 


Leu 




330 










335 




Ser 


Ser 


Ser 


Lys 


Arg 


Asp 


Trp 


Pro 


345 










350 






Ala 


Lys 


Pro 


Ala 


Tyr 


Trp 


Ser 


Val 










365 








Ser 


Pro 


Ser 


Pro 


Thr 


Val 


Thr 


Ala 








380 










Thr 


Val 


Thr 


Val 


Thr 


Ala 


Thr 


Pro 






395 










400 


Gly 


Thr 


Gly 


Ser 


Gly 


Leu 


Lys 


Val 




410 










415 




Ala 


Ser 


Thr 


Gly 


Ser 


He 


Arg 


Pro 


425 










430 






Ser 


Ser 


Ser 


Val 


Asp 


Leu 


Ser 


Arq 










445 








Val 


Asp Gly Asp 


Lvs 


Pro 


Gin 


Ser 








460 










Gly 


Ala 


Ser 


Asn 


Val 


Thr 


Phe 


Asn 






475 










480 


Ser 


Gly Ala 


Asp 


Tyr 


Tyr 


Leu 


Glu 




4 90 










4 95 




Gin 


Leu 


Gin 


Pro 


Gly 


Lys 


Asp 


Ala 


505 










510 






Lys 


Asn 


Asp 


Trp 


Ser 


Asn 


Tyr 


Asn 










525 








Gin 


Ser 


Met 


Thr 


Asp 


Tyr 


Gly 


Glu 








540 










Asp 


Gly 


Val 


Leu 


val 


Trp 


Gly 


Gin 






555 










560 


Pro 


Thr 


Ala 


Thr 


Ala 


Thr 


Pro 


Thr 




570 










575 




Pro 


Thr 


Pro 


Thr 


Ala 


Thr 


Pro 


Thr 


585 










590 






Ala 


Thr 


Pro 


Thr 


Pro 


Thr 


Val 


Ser 










605 








Ala 


Ser 


Pro 


Val 


Gly 


Gly 


Ser 


Tyr 








620 










Ala 


Leu 


Lys 


Val 


Trp 


Tyr 


Ala 


Asn 






635 










640 


Val 


Leu 


Asn 


Pro 


Lys 


He 


Lys 


He 




650 










655 




Asp 


Leu 


Ser 


Arg 


Val 


Lys 


Val 


Arg 


665 










670 






Ala 


Thr 


Gin 


Ser 


Val 


Ser 


Val 


Ala 



685 



35 



EP0 921 188 A2 



10 
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40 



45 



SO 



55 



Ser 


Ser 


He 


Asn 


Pro 


Ala 


Tup 


He 


Asp 


Val 


Lys 


Leu 


Gly Ala 


Asn 


Ala 




690 










695 










700 










Glv 


Glv 


Ala 


Asp 


Tyr 


Tyr 


Val 


Glu 


He 


Gly 


Phe 


Lys 


Ser 


Gly Ala 


Gly 


705 










710 










715 










720 


Val 


Leu 


Ala 


Ala 


Gly Gin 


Ser 


Thr 


Lys 


Glu 


He 


Arg 


Leu 


Ser 


He 


Gin 










725 










730 










735 




Lys 


Glv 


Ser 


Gly 


Ser 


Tyr 


Asn 


Gin 


Ser 


Asn Asp 


Tyr 


Ser 


Val 


Arg 


Ser 








7 40 










745 










750 




Ala 


Thr 


Glv 


iyr 


He 


Glu 






Lys 


Val 


Thr 


Gly Tyr 


He 


Asp Asp 






7 ^ 










/ DU 










765 








Val 


Leu 


Val 


irp 


Gly Arg 


bJLU 


CiO 


Ser Arg 


Asn 


Ala 


Gin 


He 


Lys 


Val 




770 










/ / O 










780 








lip 


iyr 


Ala 


Asn 


Gly Asn 


Leu 


Ser 


Ser 


Pro 


Thr 


Asn 


Val 


Leu 


Asn 


Pro 


7ft S 










790 










795 










800 


Lys 




Lys 


T 1 ft 


Glu 


Asn 


Val 


Gly 


Thr 


Thr 


Ala 


Val 


Asp 


Leu 


Ser Arg 










805 










810 










815 






Lys 


V ■a. X 


Arg 


Tyr 


Trp 


Tyr 


Thr 


He Asp 


Gly 


Glu 


Ala 


Thr 


Gin 


Ser 


















825 










830 






vdl 




V dl 


inr 


Ser 


Ser 


He 


Asn 


Pro 


Ala 


Tyr 


He 


Asp 


Val 


Lys 


Phe 






0 JJ 










840 










845 






Vdi 


Lys 


T ail 

.Leu 


(jj.y 


Ala 


Asn 


Ala 


Gly 


Gly Ala Asp 


Tyr 


Tyr 


Val 


Glu 


He 




obU 










855 










860 










Gly 


Phe 


Lys 


Ser 


Gly Ala 


Gly Val 


Leu 


Ala 


Ala 


Gly 


Gin 


Ser 


Thr 


Lys 


o c c 
ODD 










870 










875 










880 


Glu 


He 


Arg 


Leu 


Ser 


He 


Gin 


Lys 


Gly 


Ser 


Gly 


Ser 


Tyr Asn Gin 


Ser 










885 










890 










895 




Asn 


Asp 


Tyr 


Ser 


He 


Arg 


Ser 


Ala 


Asn 


Ser 


Tyr 


He 


Glu 


Asn 


Glu 


Lys 








900 










905 










910 




Val 


Thr 


Gly 


Tyr 


He 


Asp 


Gly Ala 


He 


val 


Trp 


Gly Arg Glu 


Pro 


Ser 






915 










920 










925 








ATy 


pi ,, 

la-Ly 


i nr 


Lys 


Pro 


Ala 


Gly Val 


Val 


Thr 


Pro 


Thr 


Pro 


Ala 


Pro 


Thr 




930 










935 










940 










n-- 

rro 


Thr 


Ser 


Thr 


Pro 


Thr 


Pro 


He 


Pro 


Thr 


Thr 


Thr 


Pro 


Thr 


Pro 


Thr 


945 










950 










955 










960 


Pro 


Thr 


Pro 


inr 


Val 
965 


Thr 


Val 


Thr 


Pro 


Thr 
970 


Ser 


Thr 


Pro 


Thr 


Pro 
975 


Val 


Ser 


Ser 


Ser 


930 


Pro 


Thr 


Pro 


Thr 


Ala 
985 


Thr 


Pro 


Thr 


Pro 


Thr 
990 


Pro 


Ser 


lie 


Thr 


He 


Thr 


Pro 


Ala 


Pro 


Thr 


Ala 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Ser 






995 










1000 








1005 






Val 


Thr 


Asp 


Asp 


Thr 


Asn 


Asp 


Asp 


Trp 


Leu 


Phe 


Ala 


Gin 


Gly Asn 


Lys 




1010 








1015 








1020 






lie 


Val 


Asp 


Lys 


Asp 


Gly 


Lys 


Pro 


Val 


Trp 


Leu 


Thr 


Gly Val 


Asn 


Trp 


1025 








1030 








1035 








104 


Phe 


Gly 


Phe 


Asn 


Thr 


Gly 


Thr 


Asn 


Val 


Phe 


Asp 


Gly 


Val 


Trp 


Ser 


Cys 










1045 








1050 








1055 


Asn 


Leu 


Lys 


Ser 


Ala 


Leu 


Ala 


Glu 


He 


Ala 


Asn Arg 


Gly 


Phe 


Asn 


Leu 








1060 








1065 








1070 




Leu Arg 


Val 


Pro 


He 


Ser 


Ala 


Glu 


Leu 


He 


Leu 


Asn 


Trp 


Ser 


Lys 


Gly 






1075 








1080 








1085 


He 


Tyr 


Pro 


Lys 


Pro 


Asn 


He 


Asn 


Tyr 


Tyr 


Val 


Asn 


Pro 


Glu 


Leu 


Glu 




1090 








1095 








1100 








Gly 


Leu 


Thr 


Ser 


Leu 


Glu 


Val 


Phe 


Asp 


Phe 


Val 


Val 


Lys 


Thr 


Cys 


Lys 


1105 








1110 








1115 






112 


Glu 


Val 


Gly 


Leu 


Lys 


He 


Met 


Leu 


Asp 


He 


His 


Ser 


Ala 


Lys 


Thr Asp 










1125 








1130 








1135 


Ala 


Met 


Gly 


His 


He 


Tyr 


Pro 


Val 


Trp 


Tyr 


Thr Asp 


Thr 


He 


Thr 


Pro 








1140 








1145 








1150 




Glu 


Asp 


Tyr 


Tyr 


Lys 


Ala 


Cys 


Glu 


Trp 


He 


Thr 


Glu 


Arg 


Tyr 


Lys 


Asn 






1155 








1160 






'1165 






Asp 


Asp 


Thr 


He 


Val 


Ala 


Phe 


Asp 


Leu 


Lys 


Asn 


Glu 


Pro 


His 


Gly 


Lys 
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1170 1175 1180 

Pro Trp Gin Asp Ser Val Phe Ala Lys Trp Asp Asn Ser Thr Asp lie 
1185 1190 1195 120 

Asn Asn Trp Lys Tyr Ala Ala Glu Thr Cys Ala Lys Arg lie Leu Ala 

1205 1210 1215 

Lys Asn Pro Asn Met Leu lie Val lie Glu Gly He Glu Ala Tyr Pro 

1220 1225 1230 

Lys Asp Asp Val Thr Trp Thr Ser Lys Ser Ser Ser Asp Tyr Tyr Ser 

1235 1240 1245 

Thr Trp Trp Gly Gly Asn Leu Arg Gly Val Lys Lys Tyr Pro lie Asn 

1250 1255 1260 

Leu Gly Gin Tyr Gin Asn Lys Val Val Tyr Ser Pro His Asp Tyr Gly 
1265 1270 1275 128 

Pro Leu Val Tyr Gin Gin Pro Trp Phe Tyr Pro Gly Phe Thr Lys Asp 

1285 1290 1295 

Thr Leu Tyr Asn Asp Cys Trp Arg Asp Asn Trp Thr Tyr lie Met Asp 

1300 1305 1310 

Asn Gly lie Ala Pro Leu Leu lie Gly Glu Trp Gly Gly Tyr Leu Asp 

1315 1320 1325 

Gly Gly Asp Asn Glu Lys Trp Met Thr Tyr Leu Arg Asp Tyr He He 

1330 1335 1340 

Glu Asn His He His His Thr Phe Trp Cys Tyr Asn Ala Asn Ser Gly 
1345 1350 1355 136 

Asp Thr Gly Gly Leu Val Gly Tyr Asp Phe Ser Thr Trp Asp Glu Gin 

1365 1370 1375 

Lys Tyr Asn Phe Leu Lys Pro Ala Leu Trp Gin Asp Ser Lys Gly Arg 

1380 1385 1390 

Phe Val Gly Leu Asp His Lys Arg Pro Leu Gly Thr Asn Gly Lys Asn 

1395 1400 1405 

He Asn He Thr He Tyr Tyr Gin Asn Gly Glu Lys Pro Pro Val Pro 

1410 1415 1420 

Lys Asn 
1425 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1751 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 



Met 


Gin 


Glu 


Met 


Lys 


Ala 


He 


Lys 


Arg 


Val 


Val 


Ser 


He 


Thr 


Ala 


Leu 


1 








5 










10 










15 




Leu 


Val 


Leu 


Thr 


Leu 


Ser 


Leu 


Cys 


Phe 


Pro 


Gly 


He 


Met 


Pro 


Val 


Lys 








20 










25 










30 






Ala 


Tyr 


Ala 


Gly 


Gly 


Thr 


Tyr 


Asn 


Tyr 


Gly 


Glu 


Ala 


Leu 


Gin 


Lys 


Thr 






35 










40 










45 








He 


Met 


Phe 


Tyr 


Glu 


Phe 


Gin 


Met 


Ser 


Gly 


Lys 


Leu 


Pro 


Ser 


Trp 


Val 




50 










55 










60 










Arg 


Asn 


Asn 


Trp 


Arg 


Gly 


Asp 


Ser 


Gly 


Leu 


Asp 


Asp 


Gly 


Lys 


Asp 


Val 


65 










70 










75 










80 


Gly 


Leu 


Asp 


Leu 


Thr 


Gly 


Gly 


Trp 


His 


Asp 


Ala 


Gly 


Asp 


His 


Val 


Lys 










85 










90 










95 




Phe 


Asn 


Leu 


Pro 


Met 


Ser 


Tyr 


Ser 


Ala 


Ser 


Met 


Leu 


Gly 


Trp 


Ala 


Val 








100 










105 










110 






Tyr 


Glu 


Tyr 


Lys 


Asp 


Ala 


Phe 


Val 


Lys 


Ser 


Lys 


Gin 


Leu 


Glu 


His 


He 
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115 120 



Leu 


Asn 


Gin 


He 


Glu 


Trp 


Ala 


Asn 




130 










135 




Ser 


Lys 


Tyr 


Val 


Tyr 


Tyr 


Tyr 


Gin 


145 










150 






Asn 


Phe 


Trp 


Gly 


Pro 


Ala 


Glu 


Val 










165 








Lys 


Cys 


Asp 


Leu 


Ser 


Asn 


Pro 


Ala 








180 










Ala 


Ser 


Leu 


Ala 


Val 


Ala 


Ser 


Val 






195 










200 


Lys 


Ala 


Ala 


Ser 


Tyr 


Leu 


Gin 


His 




210 










215 




Asp 


Thr 


Thr 


Arg 


Ser 


Asp 


Ala 


Gly 


225 










230 






Thr 


Ser 


Gly 


Gly 


Phe 


He 


Asp 


Asp 










245 








Tyr 


He 


Ala 


Thr 


Asn 


Asp 


Ser 


Ser 








260 










Met 


Ser 


Glu 


Tyr 


Ala 


Asn 


Gly 


Thr 






275 










280 


Asp 


Val 


Arg 


Tyr 


Gly 


Thx 


Leu 


He 




290 










295 




Glu 


Leu 


Tyr 


Lys 


Gly 


Ala 


Val 


Glu 


305 










310 






Arg 


lie 


Thr 


Tyr 


Thr 


Pro 


Lys 


Gly 










325 








Ser 


Leu 


Arg 


Tyr 


Ala 


Thr 


Thr 


Ala 








340 










Asp 


Trp 


Ser 


Gly 


Cys 


Asp 


Ser 


Asn 






355 










360 


Ala 


Lys 


Ser 


Gin 


He 


Asp 


Tyr 


Ala 




370 










375 




Val 


Val 


Gly 


Phe 


Gly 


Thr 


Asn 


Tyr 


385 










390 






Ala 


His 


Ser 


Ser 


Trp 


Ala 


Asn 


Ser 










405 








His 


He 


Leu 


Tyr 


Gly 


Ala 


Leu 


Val 








420 










Tyr 


Asn 


Asp 


Asp 


He 


Thr 


Asp 


Tyr 






435 










440 


Tyr 


Asn 


Ala 


Gly 


He 


Val 


Gly 


Ala 




450 










455 




Gly 


Gly 


Glu 


Pro 


He 


Asp 


Asp 


Phe 


465 










470 






Asp 


Glu 


He 


Phe 


Val 


Glu 


Ser 


Lys 










485 








Tyr 


Thr 


Glu 


Val 


He 


Ser 


Tyr 


He 








500 










Arg 


Val 


Thr 


Asp 


Lys 


Leu 


Ser 


Phe 






515 










520 


Leu 


He 


Gin 


Ala 


Gly 


Tyr 


Ser 


Pro 




530 










535 




Tyr 


He 


Glu 


Gly 


Gly 


Lys 


He 


Ser 


545 










550 






Arg 


Asn 


He 


Tyr 


Tyr 


Val 


Leu 


Val 










565 








Pro 


Gly Gly 


Glu 


Val 


Glu 


His 


Lys 








580 










Val 


Pro 


Gin 


Gly 


Tyr 


Pro 


Trp 


Asp 






595 










600 



125 



Asp 


Tyr 


Phe 


val 


Lys 


Cys 


His 


Pro 








140 










Val 


Gly Asp 


Pro 


Thr 


Val 


Asp 


His 






155 










160 


Met 


Gin 


Met 


Lys 


Arg 


Pro 


Ala 


Tyr 




170 










175 




Ser 


Ser 


Val 


Val 


Ala 


Glu 


Thr 


Ala 


185 










190 






Val 


He 


Lys 


Glu 


Arg 


Asn 


Ser 


Gin 










205 








Ala 


Lys 


Asp 


Leu 


Phe 


Glu 


Phe 


Ala 








220 










Tyr 


Thr 


Ala 


Ala 


Thr 


Gly 


Phe 


Tyr 






235 










240 


Leu 


Gly 


Trp 


Ala 


Ala 


Val 


Trp 


Leu 




250 










255 




Tyr 


Leu 


Thr 


Lys 


Ala 


Glu 


Glu 


Leu 


265 










270 






Asn 


Thr 


Trp 


Thr 


Gin 


Cys 


Trp 


Asp 










285 








Met 


Leu 


Ala 


Lys 


He 


Thr 


Gly 


Lys 








300 










Arg 


Asn 


Leu 


Asp 


His 


Trp 


Thr 


Asp 






315 










320 


Met 


Ala 


Tyr 


Leu 


Thr 


Gly 


Trp Gly 




330 










335 




Ala 


Phe 


Leu 


Ala 


Cys 


Val 


Tyr 


Ala 


345 










350 






Lys 


Lys 


Thr 


Lys 


Tyr 


Leu 


Asn 


Phe 










365 








Leu 


Gly 


Ser 


Thr 


Gly 


Arg 


Ser 


Phe 








380 










Pro 


Gin 


His 


Pro 


His 


His 


Arg Asn 






395 










400 


Met 


Lys 


He 


Pro 


Glu 


Tyr 


His 


Arg 




410 










415 




Gly 


Gly 


Pro 


Gly 


Ser 


Asp 


Asp 


Ser 


425 










430 






Val 


Gin 


Asn 


Glu 


Val 


Ala 


Cys 


Asp 










445 








Leu 


Ala 


Lys 


Met 


Tyr 


Gin 


Leu 


Tyr 








460 










Lys 


Ala 


He 


Glu 


Thr 


Pro 


Thr 


Asn 






475 










480 


Phe 


Gly 


Asn 


Ser 


Gin 


Gly 


Pro 


Asn 




490 










495 




Tyr 


Asn 


Arg 


Thr 


Gly 


Trp 


Pro 


Pro 


505 










510 






Lys 


Tyr 


Phe 


He 


Asp 


Leu 


Thr 


Glu 










525 








Asp 


Val 


Val 


Lys 


Val 


Asp 


Thr 


Tyr 








540 










Gly 


Pro 


Tyr 


Val 


Trp 


Asp 


Lys 


Asn 






555 










560 


Asp 


Phe 


Ser 


Gly 


Thr 


Lys 


He 


Tyr 




570 










575 




Lys 


Gin 


Ala 


Gin 


Phe 


Lys 


He 


Ser 


585 










590 






Pro 


Thr 


Asn 


Asp 


Pro 


Ser 


Tyr 


Lys 



605 



38 
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Gly 


Leu 


Thr 


Ser 


Gin 


Leu 


Glu 


Lys 


Asn 


Lys 


Tyr 


He 


Ala 


Ala 


Tyr Asp 




610 










615 










620 








Asn 


Asn 


Asn 


Leu 


Val 


Trp 


Gly 


Leu 


Glu 


Pro 


Gly 


Ala 


Ala 


Thr 


Ser Thr 


625 










630 










635 








640 


Pro 


Ala 


Pro 


Thr 


Ser 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro Thr 










64 5 










650 










655 


Val 


Thr 


Ala 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Gly Ser 








660 










665 










670 




Pro 


Gly 


Thr 


Gly 


Ser 


Gly 


Val 


Lys 


Val 


Leu 


Tyr 


Lys 


Asn 


Asn 


Glu Thr 






675 










680 










685 






Ser 


Ala 


Ser 


Thr 


Gly 


Ser 


He 


Arg 


Pro 


Trp 


Phe 


Lys 


He 


Val 


Asn Gly 




690 










695 










700 








Gly 


Ser 


Ser 


Ser 


Val 


Asp 


Leu 


Ser 


Arg 


Val 


Lys 


He 


Arg 


Tyr 


Trp Tyr 


705 










710 










715 








720 


Thr 


Val 


Asp 


Gly 


Asp 


Lys 


Pro 


Gin 


Ser 


Ala 


Val 


Cys 


Asp 


Trp 


Ala Gin 










725 










730 










735 


lie 


Gly 


Ala 


Ser 


Asn 


Val 


Thr 


Phe 


Asn 


Phe 


Val 


Lys 


Leu 


Ser 


Ser Gly 








740 










745 










750 




Val 


Ser 


Gly 


Ala 


Asp 


Tyr 


Tyr 


Leu 


Glu 


Val 


Gly 


Phe 


Ser 


Ser 


Gly Ala 






755 










760 










7 65 






Gly 


Gin 


Leu 


Gin 


Pro 


Gly 


Lys 


Asp 


Thr 


Gly 


Asp 


He 


Gin 


Val 


Arg Phe 




770 










775 










780 








Asn 


Lys 


Asn 


Asp 


Trp 


Ser 


Asn 


Tyr 


Asn 


Gin 


Ala 


Asp 


Asp 


Trp 


Ser Trp 


785 










790 










7 95 








800 


Leu 


Gin 


Ser 


Met 


Thr 


Asn 


Tyr 


Gly 


Glu 


Asn 


Ala 


Lys 


Val 


Thr 


Leu Tyr 










805 










810 










815 


Val 


Asp 


Gly 


Val 


Leu 


Val 


Trp 


Gly 


Gin 


Glu 


Pro 


Gly 


Gly 


Ala 


Thr Pro 








820 










825 










830 




Ala 


Pro 


Thr 


Ser 


Thr 


Ala 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Ala 


Thr Pro 






835 










840 










845 






Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Pro 


Thr 


Val 


Ser 


Ala 


Thr Pro 




850 










855 










860 








Thr 


Pro 


Ala 


Pro 


Thr 


Ala 


Ser 


Pro 


Val 


Gly 


Gly 


Ser 


Tyr 


Trp 


Thr Pro 


8 65 










870 










875 








880 


Ser 


Glu 


Ser 


Tyr 


Gly 


Ala 


Leu 


Lys 


Val 


Trp 


Tyr 


Ala 


Asn 


Gly 


Asn Leu 










885 










890 










895 


Ser 


Ser 


Pro 


Thr 


Asn 


Val 


Leu 


Asn 


Pro 


Lys 


He 


Lys 


He 


Glu 


Asn Val 








900 










905 










910 




Gly 


Thr 


Thr 


Ala 


Val 


Asp 


Leu 


Ser 


Arg 


Val 


Lys 


Val 


Arg 


Tyr 


Trp Tyr 






915 










920 










925 






Thr 


He 


Asp 


Gly 


Glu 


Ala 


Thr 


Gin 


Ser 


Val 


Ser 


Val 


Ala 


Ser 


Ser He 




930 






935 










940 








Asn 


Pro 


Ala 


Tyr 


He 


Asp 


Val 


Lys 


Phe 


Val 


Lys 


Leu 


Gly 


Ala 


Asn Ala 


945 










950 










955 








960 


Gly 


Gly 


Ala 


A S P 


Tyr 


Tyr 


Val 


Glu 


He 


Gly 


Phe 


Lys 


Ser 


Gly 


Ala Gly 








965 










970 










975 


Val 


Leu 


Ala 


Ala 


Gly 


Gin 


Ser 


Thr 


Lys 


Glu 


He 


Arg 


Leu 


Ser 


He Gin 








980 










985 










990 




Lys 


Gly 


Ser 


Gly 


Ser 


Tyr 


Asn 


Gin 


Ser 


Asn 


Asp 


Tyr 


Ser 


Val 


Arg Ser 






995 










1000 








1005 




Ala 


Asn 


Ser 


Tyr 


He 


Glu 


Asn 


Glu 


Lys 


Val 


Thr 


Gly 


Tyr 


He 


Asp Asp 




1010 








1015 








1020 






Val 


Leu 


Val 


Trp 


Gly 


Arg 


Glu 


Pro 


Gly 


Arg 


Asn 


Ala 


Gin 


He 


Lys Val 


1025 








1030 








1035 






104 


Trp 


Tyr 


Ala 


Asn 


Gly 


Asn 


Leu 


Gly 


Ser 


Met 


Thr 


Asn 


Val 


Leu 


Asn Pro 






1045 








1050 








1055 


Lys 


He 


Lys 


He 


Glu 


Asn 


Val 


Gly Thr 


Thr 


Ala 


Val 


Asp 


Leu 


Ser Arg 



1060 1065 1070 



Val Lys Val Arg Tyr Trp Tyr Thr He Asp Gly Glu Ala Thr Gin Ser 

1075 1080 1085 

Val Ser Val Thr Ser Ser He Asn Pro Ala Tyr He Asp Val Lys Phe 
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10 



15 



1090 1095 1100 

Val Lys Leu Gly Ala Asn Ala Gly Gly Ala Asp Tyr Tyr Val Glu lie 
1105 1110 1115 112 

Gly Phe Lys Ser Gly Ala Gly Val Leu Ala Ala Gly Gin Ser Thr Lys 

1125 1130 1135 

Glu lie Arg Leu Ser lie Gin Lys Gly Ser Gly Ser Tyr Asn Gin Ser 

1140 1145 1150 

Asn Asp Tyr Ser Val Arg Ser Ala Thr Gly Tyr lie Glu Asn Glu Lys 

1155 1160 1165 

Val Thr Gly Tyr lie Asp Gly Ala lie Val Trp Gly Arg Glu Pro Ser 

1170 1175 1180 

Arg Gly Thr Lys Pro Ala Gly Gly Val Thr Pro Thr Pro Ala Pro Thr 
1185 1190 1195 120 

Pro Thr Ser Thr Pro Thr Pro Thr Pro Thr Thr Thx Pro Thr Pro Thr 

1203 1210 1215 

Pro Thr Val Thr Val Thr Pro Thr Pro Thr Pro Ala Val Thr Pro Asp 

1220 1225 1230 

Val Lys tie Ser lie Asp Thr Ser Arg Gly Arg Thr Lys lie Ser Pro 

1235 1240 1245 

Tyr lie Tyr Gly Ala Asn Gin Asp lie Gin Gly Val Val His Pro Ala 
20 1250 1255 1260 

Arg Arg Leu Gly Gly Asn Arg Leu Thr Gly Tyr Asn Trp Glu Asn Asn 
1265 1270 1275 128 

Met Ser Asn Ala Gly Ser Asp Trp Tyr His Ser Ser Asp Asp Tyr Met 

1285 1290 1295 

Cys Tyr lie Met Gly lie Thr Gly Asn Asp Lys Asn Val Pro Ala Ala 
25 1300 1305 1310 

Val Val Ser Lys Phe His Glu Gin Ser lie Lys Gin Asn Ala Tyr Ser 

1315 1320 1325 

Ala lie Thr Leu Gin Met Val Gly Tyr Val Ala Lys Asp Gly Asn Gly 

1330 1335 1340 

Thr Val Ser Glu Ser Glu Thr Ala Pro Ser Pro Arg Trp Ala Glu Val 
1345 1350 1355 136 

Lys Phe Lys Lys Asp Gly Ala Leu Ser Leu Gin Pro Asp Val Asn Asp 

1365 1370 1375 

Asn Tyr Val Tyr Met Asp Glu Phe lie Asn Tyr Leu He Asn Lys Tyr 
1380 1385 1390 

35 Gly Arg Ser Ser Ser Ala Thr Gly He Lys Gly Tyx He Leu Asp Asn 

1395 1400 1405 

Glu Pro Asp Leu Trp Phe Thr Thr His Pro Arg lie His Pro Gin Lys 

1410 1415 1420 * 

Val Thr Cys Ser Glu Leu lie Asn Lys Ser Val Glu Leu Ala Lys Val 
1425 1430 1435 144 

He Lys Thr Leu Asp Pro Asp Ala Glu He Phe Gly Pro Ala Ser Tyr 

1445 1450 1455 

Gly Phe Val Gly Tyr Leu Thr Leu Gin Asp Ala Pro Asp Trp Asn Gin 

1460 1465 1470 

Val Lys Gly Asn His Arg Trp Phe Leu Ser Trp Tyr Leu Glu Gin Met 

1475 1480 1485 

Lys Lys Ala Ser Asp Ser Phe Gly Lys Arg Leu Leu Asp Val Leu Asd 

1490 1495 1500 

He His Trp Tyr Pro Glu Ala Gin Val Gly Gly Val Arg He Cys Phe 
1505 1510 1515 152 

Asp Gly Glu Asn Ser Thr Ser Arg Asp Val Ala He Ala Arg Met Gin 
so 1525 1530 1535 

Ala Pro Arg Thr Leu Trp Asp Pro Thr Tyr Lys Thr Thr Gin Lys Gly 

1540 1545 1550 

Gin He Thr Ala Gly Glu Asn Ser Trp He Asn Gin Trp Phe Pro Glu 

1555 1560 1565 

Tyr Leu Pro Leu Leu Pro Asn He Lys Ala Asp He Asp Lys Tyr Tyr 
55 1570 1575 1580 
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Pro Gly Thr Lys Leu Ala lie Thr Glu Phe Asp Tyr Gly Gly Lys Asp 
1585 1590 1595 160 

His lie Ser Gly Gly lie Ala Leu Ala Asp Val Leu Gly He Phe Gly 
5 1605 1610 1615 

Lys Tyr Gly Val Tyr Met Ala Ala Arg Trp Gly Asp Ser Gly Ser Tyr 

1620 1625 1630 

Ala Gin Ala Ala Tyr Asn lie Tyr Leu Asn Tyr Asp Gly Lys Gly Ser 

1635 1640 1645 

Arg Tyr Gly Ser Thr Cya Val Ser Ala Glu Thr Thr Asp Val Glu Asn 
10 1650 1655 1660 

Met Pro Val Tyr Ala Ser He Glu Gly Glu Asp Asp Ser Thr Val His 
1665 1670 1675 168 

He He Leu He Asn Arg Asn Tyr Asp Arg Lys Leu Lys Ala Glu lie 
1685 1690 1695 

75 Lys Met Asn Asn Thr Arg Val Tyr Thr Gly Gly Glu He Tyr Gly Phe 

1700 1705 1710 

Asp Ser Thr Ser Ser Gin He Arg Lys Met Gly Val Leu Ser Asn He 

1715 1720 1725 

Gin Asn Asn Thr He Thr He Glu Val Pro Asn Leu Thr Val Tyr His 
1730 1735 1740 

20 He Val Leu Thr Sex Ser Lys 

1745 1750 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



30 



(ii) MOLECULE TYPE: Other 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
GCGTGGTATG CAATATAC 18 
35 (2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2029 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
40 (D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

ATGGGAAGTG GTGTGAAGGT ACTGTACAAG AACAATGAGA CAAGTGCGAG CACAGGTTCT 60 

45 ATAAGGCCGT GGTTTAAGAT AGTGAATGGA GGCAGCAGCA GTGTTGATCT TAGCAGGGTT 120 

AAGATAAGAT ACT GGTACAC AGTGGATGGT GACAAGCCAC AGAGTGCGGT ATGTGACTGG 180 

GCACAGATAG GGGCAAGCAA TGTGACATTC AATTTTGTGA AGCTTAGCAG CGGAGTGAGT 240 

GGAGCGGATT ATTACCTGGA GGTAGGATTT AGCAG TGGAG CTGGGCAGTT GCAGCCTGGT 300 

AAGGACACAG GGGATATACA GGTAAGGTTT AACAAGAATG ACTGGAGCAA TTACAATCAG 360 

SQ GCAGACGACT GGTCATGGTT GCAGAGCATG ACGAATTATG GAGAGAATGC GAAGGTGACG 420 

CTGTATGTAG ATGGTGTTCT GGTATGGGGG CAGGAGCCGG GAGGAGCGGT GACCCCAACT 480 

TCTACACCCA CACCGGTTTC ATCAT CC ACT CCTACACCAA CAGCAACGCC AACACCTACA 540 

CCTTCTATCA CGATAACACC AGCGCCAACT GCAACACCCA CTCCGACTCC TTCTGTCACA 600 

GATGATACAA ATGATGATTG GTTATTTGCG CAGGGTAACA AAATAGTCGA CAAGGATGGC 660 

AAACCTGTAT GGTTAACAGG AGTTAATTGG TTTGGATTTA ATACAGGAAC GAATGTGTTT 720 

55 GATGGTGTGT GGAGTTGTAA TCTTAAAAGT GCATTAGCTG AGATTGCAAA CAGAGGATTT 7 80 



41 



EP0 921 188 A2 



10 



AATTTGCTAA GAGTACCGAT TTCAGCAGAG CTGATTTTGA ATTGGTCGAA AGGAATTTAT 840 

CCAAAACCAA ATATCAATTA TTATGTTAAC CCTGAGTTAG AAGGTCTGAC GAGTTTAGAG 900 

GTATTTGATT TTGTAGTAAA AACATGCAAA GAAGTTGGAC TGAAAAT TAT GTTGGATATT 960 

CATAGTGCAA AAACTGATGC GATGGGGCAT ATATATCCGG TATGGTATAC AGATACTATA 1020 

ACGCCAGAAG ATT AT T AT AA AGCATGTGAA TGGATCACAG AGAGATATAA AAATGATGAT 1080 

ACAATTGTAG CATTTGATTT G AAGAAT GAG CCACATGGTA AACCATGGCA AGATAGTGTT 1140 

TTTGCAAAAT GGGACAATTC AACAGATATT AACAACTGGA AATATGCAGC TGAGACCTGT 1200 

GCGAAGAGAA TACTTGCAAA AAATCCAAAC ATGTTAATAG TAATTGAAGG AATAGAAGCT 1260 

TATCCAAAAG ATGAT GTTAC GTGGACT TCT AAATCATCAA GTGACTATTA TTCTACCTGG 1320 

TGGGGCGGCA ACTTACGGGG TGTTAAAAAG TATCCAATAA ACCTTGGACA GTATCAGAAC 1380 

AAAGTGGTTT ATTCACCACA TGATTATGGA CCATTGGTTT ACCAGCAACC CTGGTTTTAT 14 40 

CCTGGATTTA CCAAAGATAC GCTTTACAAT GATTGCTGGA GGGATAATTG G AC T TAT AT T 1500 

ATGGATAATG GGATAGCTCC GTTGCTCATT GGTGAATGGG GTGGTTACTT AGATGGTGGC 1560 

GATAATGAAA AGTGGAT GAC TTATTTGAGA GATTATATTA TAGAAAACCA TATTCATCAT 1620 

15 ACATTCTGGT GTTACAATGC AAATTCTGGT GATACTGGAG GATTGGTGGG AT ATGAT TTT 1680 

TCGACGTGGG ATGAACAGAA GTACAATTTC TTAAAACCAG CTTTATGGCA GGATAGTAAA 174 0 

GGAAGATTTG TTGGGCTTGA T C AC AAGAG A CCACTGGGTA CAAATGGGAA GAATATAAAT 1800 

ATAACTATTT ATTACCAGAA CGGTGAAAAA CCGCCTGTCC CAAAGAATTA AT AAATG GAT 18 60 

CCGGCTGCTA ACAAAGCCCG AAAGGAAGCT GAGTTGGCTG CTGCCACCGC TGAGCAATAA 1920 

CTAGCATAAC CCCTTGGGGC CTCTAAACGG GTCTTGAGGG GTTTTTTGCT GAAAGGAGGA 1980 

20 ACTAT AT CCG GATATCCACA GGACGGGTGT GGTCGCCATG ATCGCGTAG 2029 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 616 amino acids 
25 (B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY i linear 
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35 



40 



45 



SO 



55 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 



Met 


Gly 


Ser 


Gly 


Val 


Lys 


Val 


Leu 


Tyr 


Lys 


Asn 


Asn 


Glu 


Thr 


Ser 


Ala 


1 








5 










10 










15 




Ser 


Thr 


Gly 


Ser 


He 


Arg 


Pro 


Trp 


Phe 


Lys 


He 


Val 


Asn 


Gly 


Gly 


Ser 








20 










25 










30 






Ser 


Ser 


Val 


Asp 


Leu 


Ser 


Arg 


Val 


Lys 


He 


Arg 


Tyr 


Trp 


Tyr 


Thr 


Val 






35 










40 










45 








Asp 


Gly 


Asp 


Lys 


Pro 


Gin 


Ser 


Ala 


Val 


Cys 


Asp 


Trp 


Ala 


Gin 


He 


Gly 




50 










5S 










60 










Ala 


Ser 


Asn 


Val 


Thr 


Phe 


Asn 


Phe 


Val 


Lys 


Leu 


Ser 


Ser 


Gly 


Val 


Ser 


65 










70 










75 








80 


Gly 


Ala 


Asp 


Tyr 


Tyr 


Leu 


Glu 


Val 


Gly 


Phe 


Ser 


Ser 


Gly 


Ala 


Gly 


Gin 










85 










90 










95 




Leu 


Gin 


Pro 


Gly 


Lys 


Asp 


Thr 


Gly 


Asp 


He 


Gin 


Val 


Arg 


Phe 


Asn 


Lys 








100 










105 










110 






Asn 


Asp 


Trp 


Ser 


Asn 


Tyr 


Asn 


Gin 


Ala 


Asp 


Asp 


Trp 


Ser 


Trp 


Leu 


Gin 






115 










120 










125 








Ser 


Met 


Thr 


Asn 


Tyr 


Gly 


Glu 


Asn 


Ala 


Lys 


Val 


Thr 


Leu 


Tyr 


Val 


Asp 




130 










135 










140 










Gly 


val 


Leu 


Val 


Trp 


Gly 


Gin 


Glu 


Pro 


Gly 


Gly 


Ala 


Val 


Thr 


Pro 


Thr 


145 










150 










155 










160 


Ser 


Thr 


Pro 


Thr 


Pro 


Val 


Ser 


Ser 


Ser 


Thr 


Pro 


Thr 


Pro 


Thr 


Ala 


Thr 










165 










170 










175 




Pro 


Thr 


Pro 


Thr 


Pro 


Ser 


He 


Thr 


He 


Thr 


Pro 


Ala 


Pro 


Thr 


Ala 


Thr 








180 










185 










190 






Pro 


Thr 


Pro 


Thr 


Pro 


Ser 


Val 


Thr 


Asp 


Asp 


Thr 


Asn 


Asp 


Asp 


Trp 


Leu 






195 










200 










205 








Phe 


Ala 


Gin 


Gly 


Asn 


Lys 


He 


Val 


Asp 


Lys 


Asp 


Gly 


Lys 


Pro 


Val 


Trp 



42 



EP0 921 188 A2 





pin 






LeU 


Thr 
x nx 


ul y 


v ax 


225 










Gly 


Val 




Asn 


Arg 


Gly 


Phe 








2 fiO 

^ D U 


Leu 


Asn 


Trp 


o 6i 






27 ^ 




vai 


Asn 


irro 












vai. 


Vdl 


Lys 


i nr 










HAS 






Lys 


Thr 


Asp 


Thr 


lie 








34 U 


Thr 


Glu 


Arg 


Tyr 










Asn 


Glu 


Pro 


His 




3 / U 






Asp 


Asn 


Ser 


Thr 


T O C 

Jo!) 








Ala 


Lys 


Arg 


lie 


Gly 


lie 


Glu 


Ala 








4 


Ser 


Ser 


Asp 


Tyr 






yi *3 c 
4 J 3 




Lys 


Lys 


Tyr 


Pro 




4 3 0 






Ser 


Pro 


His 


Asp 


4 65 








Pro 


Gly 


Pne 


Thr 


Trp 


Thr 


Tyr 


He 








jUU 


Trp 


Gly 


Gly 


Tyr 










T At 1 

JjGU 


Arg 


Asp 


Tyr 










Tyr 


Asn 


Ala 


Asn 


545 








Ser 


Thr 


Trp 


Asp 


Gin 


Asp 


Ser 


Lys 








580 


Gly 


Thr 


Asn 


Gly 






595 




Glu 


Lys 


Pro 


Pro 




610 











21 S 

£m 1 -J 




AO 1 1 


ixrp 


Phe 


Gly 




230 






Ser 


Cys 


Asn 


Leu 


245 








Asn 


Leu 


Leu 


Arg 


Lys 


Gly 


He 


Tyr 








280 


Leu 




vj-L y 


Leu 






2QS 




Cys 


Lys 


G1U 


V ai 




JlU 






Tnr 


Asp 


Ala 


lyiei. 


JZ3 








ml, w 

Tnr 


O W A 

Pro 


GIU 


ASp 


Lys 


Asn 


Asp 


Asp 








JDU 


Gly 


Lys 


Pro 


Trp 






3/3 




Asp 


lie 


Asn 


Asn 




"5 on 






Leu 


Ala 


Lys 


Asn 


a nc 

4 (J3 








Tyr 


Pro 


Lys 


Asp 


Tyr 


Ser 


Thr 


Trp 








4 4 U 


He 


Asn 


Leu 


Gly 






455 




Tyr 


Gly 


Pro 


Leu 




470 






Lys 


Asp 


Thr 


Leu 


4 0 3 










Asp Asn 


Giy 


Leu 


Asp 


Gly 


Gly 










lie 


He 


Glu 


Asn 






535 




O A V 

ber 


Gly Asp 


rri Vv v> 

i nr 




550 






Glu 


Gin 


Lys 


Tyr 


565 








Gly 


Arg 


Phe 


Val 


Lys 


Asn 


He 


Asn 








600 


Val 


Pro 


Lys 


Asn 






615 





220 



Phe 


Asn 


Thr 


Glv 






235 




Lvs 


Ser 


Ala 


Leu 




250 






Val 


Pro 


He 


Ser 


265 








Pro 


Lys 


Pro 


Asn 


Thr 


C? £a >- 


T.*»n 


Glu 












T. At i 


uyd 


T 1 (a 
lie 






3X3 




Giy 


IllS 


lie 


Tyr 




33U 






Tyr 


Tyr 


Lys 


AT a 
Ala 


34 3 








Thr 


He 


Val 


Ala 


Gin 


ASp 


O A V 

ser 


val 








JoU 


Trp 


Lys 


Tyr 


Ala 






-3 ft C 

393 




fro 


Asn 


Met 


Leu 




4 10 






Asp 


val 


Thr 


Trp 


423 








Trp 


Gly 


Gly 


Asn 


Gin 


Tyr 


Gin 


Asn 








A C f\ 

4 oU 


vai. 


Tyr 


bin 


Gin 






4 /3 




Tyr 


Asn 


Asp 


Cys 




a on 






11c 


nla 


IT X <J 


Leu 


3U3 








Asp 


Asn 


Glu 


Lys 


His 


He 


His 


His 








540 


Gly 


Gly 


Leu 


Val 






555 




Asn 


Phe 


Leu 


Lys 




570 






Gly 


Leu 


Asp 


His 


585 








He 


Thr 


He 


Tyr 



Thr Asn Val Phe 
240 

Ala Glu He Ala 
255 

Ala Glu Leu He 
270 

He Asn Tyr Tyr 
285 

Val Phe Asp Phe 

Met Leu Asp He 
320 

Pro Val Trp Tyr 
335 

Cys Glu Trp He 
350 

Phe Asp Leu Lys 
365 

Phe Ala Lys Trp 

Ala Glu Thr Cys 
400 

He Val He Glu 
415 

Thr Ser Lys Ser 
430 

Leu Arg Gly Val 
445 

Lys Val Val Tyr 

Pro Trp Phe Tyr 
480 

Trp Arg Asp Asn 
495 

Leu He Gly Glu 
510 

Trp Met Thr Tyr 
525 

Thr Phe Trp Cys 

Gly Tyr Asp Phe 
560 

Pro Ala Leu Trp 
575 

Lys Arg Pro Leu 
590 

Tyr Gin Asn Gly 
605 



Claims 



A DNA sequence free of native source genomic DNA and encoding a cellulase active protein comprising the (Cel 
B5) amino acid sequence extending from amino acid position No. A1001 through amino acid position No. P1424 
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or K1425 or N1426 in SEQ. ID No. 43, or the (Cel B4/5) amino acid sequence extending from amino acid position 
No. K635 through amino acid position No. N1426 in SEQ. ID No.43, or the (Cel E1 ) amino acid sequence extending 
from amino acid position No.Y39 through amino acid position No. D481 in SEQ. ID No. 44, or the (Cel E1/2) amino 
acid sequence extending from amino acid position No. Y39 through amino acid position No. G635 in SEQ. ID No. 
44, or the (Cel E 1/2/3) amino acid sequence extending from amino acid position No. Y39 through amino acid 
position No. G812 in SEQ. ID No. 44, or the (Cel E6) amino acid sequence extending from amino acid position 
No. V1 233 through amino acid position No. K1751 in SEQ. ID No. 44, or the (stability region) amino acid sequence 
extending from amino acid position No. E482 through amino acid position NO.G635 in SEQ. ID No. 44, or the (Cel 
E3/B5) amino acid sequence in SEQ. ID No. 47, or a functional equivalent of said proteins. 

2. A recombinant DNA vector comprising: 

a) a DNA sequence encoding a cellulase active protein according to claim 1 ; and 

b) heterologous vector DNA. 

3. A recombinant DNA expression vector according to claim 2 in which the vector DNA comprises promoter DNA 
operatively controlling expression of the DNA encoding the cellulase protein. 

4. A recombinant DNA expression vector according to claim 3 in which said promoter DNA is heterologous DNA. 

5. A recombinant DNA expression vector according to claim 3 in which the vector DNA comprises homologous pro- 
moter DNA operatively controlling expression of the DNA encoding the cellulase protein. 

6. A cell transformed with an expression vector of claim 3. 

7. A recombinant cellulase active protein substantially free of proteinases of native thermophilic and alkalinophilic 
origin and comprising the (Cel B5) amino acid sequence extending from amino acid position No. A1001 through 
amino acid position No. P1424 or K1425 or N1426 in SEQ. ID No. 43, or the (Cel B4/5) amino acid sequence 
extending from amino acid position No. K635 through amino acid position No. N1426 in SEQ. ID No.43, or the 
(Cel E1 ) amino acid sequence extending from amino acid position No.Y39 through amino acid position No. D481 
in SEQ. ID No. 44, or the (Cel E1/2) amino acid sequence extending from amino acid position No. Y39 through 
amino acid position No. G635 in SEQ. ID No. 44, or the (Cel E 1/2/3) amino acid sequence extending from amino 
acid position No. Y39 through amino acid position No. G81 2 in SEQ. I D No. 44, or the (Cel E6) amino acid sequence 
extending from amino acid position No. V1 233 through amino acid position No. K1751 in SEQ. ID No. 44, or the 
(stability region) amino acid sequence extending from amino acid position No. E482 through amino acid position 
No.G635 in SEQ. ID No. 44, or the (Cel E3/B5) amino acid sequence in SEQ. ID No. 47, or a functional equivalent 
thereof. 

8. A DNA sequence free of native source genomic DNA and encoding a fragment of cellulase active protein comprising 
the (tokcelef) nucleotide sequence of SEQ. ID No. 9, or its functional equivalent when used in the amplification of 
endoglucanase genes. 

9. A recombinant DNA vector comprising: 

a) a DNA sequence of claim 8; and 

b) homologous or heterologous vector DNA. 

10. A cell transformed with the expression vector of claim 9. 

11. A laundry detergent composition comprising a cellulase active protein in an amount sufficient to confer anti-graying 
or anti-backstaining properties to the detergent composition, the cellulase active protein being selected from the 
group consisting of Cel B5, Cel B4/5, Cel E1, Cel E1/2, Cel E1/2/3, or Cel E6, or the protein (stability region) amino 
acid sequence extending from amino acid position No. E482 through amino acid position No.G635 in SEQ. ID No. 
44, or Cel E3/B5, or a functional equivalent of said protein. 

12. The method of treating cellulosic containing material to prevent or remove staining, backstaining, or graying, com- 
prising contacting said material with an aqueous solution of laundry detergent composition containing a cellulase 
active protein in an amount sufficient to confer anti-stain ing or anti-backstaining or anti-graying properties to the 
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laundry detergent, the cellulase active protein being selected from the group consisting of Cel B5, Cel B4/5, Cel 
E1, Cel E1/2, Cel E1/2/3, or Cel E6 5 or the protein (stability region) amino acid sequence extending from amino 
acid position No. E482 through amino acid position No.G635 in SEQ, ID No, 44, or Cel E3/B5, or a functional 
equivalent of said protein. 

5 

13. A E.coli bacterium having the identifying characteristics of ATCC Accession Nos. 98523 or 98524 or a variant or 
mutant thereof which produces a cellulase active protein being selected from the group consisting of Cel B5, Cel 
B4/5, Cel El, Cel El/2, Cel El/2/3, or Cel E6, or the protein (stability region) amino acid sequence extending from 
amino acid position No. E482 through amino acid position No. G635 in SEQ. I D No. 44, or Cel E3/B5, or a functional 
10 equivalent of said protein. 
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Figure 1 



B 



Molecular 
Weight Stds 




N-terminal sequence found: 

B1 AAYNYGEALQKAIMFYEFXM 

B2 APDWSIPSLWESYKND 

B3 AAYNYGEALQ 

B4 APDWSIPSLW 

B5 GAYNYGEALQ 

B6 GAYNY 



A) A composite diagram of protein bands that contained cellulase activity from the 
Tok7B.1 supernatant purified on either S-sepharose or Q sepharose . The protein 
bands were designated B1 through B6 each of the designated bands was N- 
terminally sequenced. 

B) The N-terminal sequence found for each band is shown above. Two seperate 
N -terminal sequences were identified corresponding to the N-terminus of the Cel 
E and Cel B genes shown in Figure 3. 
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Figure 2. 



Blast sequence homology search with the identified N-terminal peptides shows 
the proteins have homology with Families 9 & 10 from Glycosyl hydrolases. Areas of 
homology between sequenced N-termini are shown in black backgrounds with white 
lettering. 



' Peptide 
No. 


Amino-terminal amino 
acid sequence 


Glycosyl Hydrolase Family 
based on amino acid 
homology comparisons 


B1 


gAYNYGEAL 


gKAIMFYEFXM 




B3 


Baynygeal 


S 


Glycosyl hydrolase 


B5 


Baynygeal 




Family 9 


B6 


^jjNmHjjj 












B2 




Glycosyl hydrolase 


B4 


Iapdwsipslw^^^^^^^I 


Family 1 0 
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Figure 8. 



Tok7B. 1 CBD-catalytic domain PCR 
products expressed from pJLA602 



Carboxy-terminal domains 



z 



TokCBDF 



celB 



In pJLA602: pRR6 



TokCBCFSphl 




h ceIC 

In pJLA602: pRR7 



TokCBDFSphl 




celE 

In pJLA602: pRR8 



celE 



In pJLA602: pRR10 
(Ndel /Sail fragment 
subcloned from pRR9) 



Amino-terminal domains 




celE 

In pJLA602: pRR3 




celE 

In pJLA602: pRR2 



1000bp 



Full length genes 

co z tn z m 



celB 

In pJLA602: pRR1 



« Hi 



8 Sl-S 
in zmz 




1 2 3 4 



celE 

In pJLA602: pRR9 
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Figure 9. 



r FK8.B7 
J~t RI8.B15 
. L Rt8B.4 
— Tok7.B1 

P Caldicelluiosiruptor saccharoiyticus 
— Comp.Bl 

Anaerocellum thermophilum 



R12.B1 

Clostridium thermocellum 

Clostridium stercorarium 



i 



Clostridium thermoautotrophicum 
Bacillus subtilis 



Dictyoglomus thermophilum 
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FIGURE 11 




I 

-Digest with Smal and dephosphorylate 
-Purify the vector fragment 



Ligate to celE-D2 PCR fragment 




Ligate to celE-D2/3 PCR fragment 




-PCR amplify the Mature celE gene to the Ndel site in D2 
Forward Primer: tokcbdf (Introduces Ndel site) 
Reverse Primer: tokcel 

i 
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FIGURE 12 



Ndel 




-Digest pET9a with Ndel and Isolate 
the vector fragment 



Isolate the 1 .8 Kb PGR fragment 



Ndel 



Ndel 



endoglucanase 



CBD 



[celE-Ndel PCR product] 



-Digest celE-Ndel PCR product with Ndel and 
Isolate the 1 .8 Kb fragment 



T 



-Ligate the two fragments together 



BamHI 



Ndel-^^— ^^^^ 

M pMcelE/Ndel \ 




-Digest pMcelE-Ndel with Pstl and BamHI and 

Isolate the vector fragment 
-Digest M13-celE1 with Pstl and Bglll and 

Isolate the 0.95 Kb CelE fragment. 
-Ligate the fragments together 



-Digest pMcelE-Ndel with Pstl and BamHI and 

Isolate the vector fragment 
-Digest M13-celE1/2 with Pstl and Bglil and 

Isolate the 1 .4 Kb CelE fragment 
-Ligate the fragments together 



Bglll/BamHI 



Bglll/BamHI 



♦ 



Pstl- 



Ndel 



pMcelEI 



Pstl 




Ndel 
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Figure 13 




-Digest pMcelE-Ndel with Ncol and BamHI -Digest PCR fragment with Ncol and BamHI 

-Isolate the vector fragment from the digest -Isolate the Ncol-BamHI fragment from the digest 

by gel electrophoresis and silical gel purification by gel electrophoresis and silica gel 

purification 

-Ligate the vector fragment to the Ncol/BamHI 
digested PCR fragment -Ligate the Ncol/BamHI fragment to the 

digested pMcelE/Ndel vector 
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FIGURE 14 




-Digest with Ndel and BamHI and isolate the 
celB fragment 

-Ligate to Ndel and BamHI digested pET9a 
j vector 
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Figure 14A 

Ndel BstElt 



Construction of pcelE3/B5 




Ndel BstEII 




-Digest with Ndel and BstEII and isolate the vector 
fragment by gel electrophoresis and silica get 
purification. 



-PCR amplify the CBD region shown above-the forward 
primer introduces an Ndel site and the reverse primer 
introduces a BstEII site. 



Ndel 



BstEII 



-Digest the fragment with Ndel and BstEII and purify 
by spin column elution. 



-Ligate the Ndel/BstEll digested PCR fragment to 
the digested pcelB4/5 vector. 



Ndel BstEII 
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Figure 15. 



Sequence Analysis of the Cloned Cellulases 



Cellulase 
Construct 


N-terminal Sequence 
Expected Found 


MALDT-TOF Analysis 
Expected Found 


El 


AAYNYGEA 


AAYNYGEA 






El/2 






67,425 


67,425 (a) 
67,245 (b) 


B4/5 


MKVWYANG 


MKVWYANG (c) 
(X)PTPTPTP(T)I (d) 






B5 


ATPSTPTPS 


ATPSTPTPS 


48.991 


48.691(e) 



(a) N-terminal amino acids were changed from GT— -» AA inorder to 
facilitate cloning of the protein and based on the found N-terminal sequence 
of the protein. 

(b) N-terminal clipping of the two alanines would result in a 179 dalton 
decrease in the molecular weight of the protein. 

(c) Sequences gave approximately equal picomolar quantities of signal. 

(d) Internal cleavage site matching a site in the P-T linker region, (X) 
indicates that there was no amino acid was detected during the first cycle. 

(e) C-terminal clipping of the the final two amino acids, lysine and 
asparagine, would give the correct molecular weight of 46,691. 
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Table I. 



Oligonucletotide primers designed and synthesized for PCR amplification, genomic walking, and 
sequencing of cellulase genes from Tok7B.l . 



Primer Name 


Seq # 


Nucleotide Sequence 


Length 


avicelr 


21 


5'- 


TGTATCCCATGCCGTCTT -3' 


18 


TokcelA 


22 


5'- 


CAAAAAGCAATTATG I F F TATGAATT -3 ' 


26 


celagwr 


23 


5'- 


TGGTGCTGGCAATGTTGAGTTGGC -3' 


24 


celagwr2 


24 


5'- 


TCGGTAGTGCCACTTTCAAATCCA -3' 


24 


celasf 


25 


5'- 


CAAAGCAGACGAATCTGT -3' 


18 


celasr 


45 


5'- 


GCGTGGT ATGCAATATAC -3' 


18 


celbcbdlf 


26 


5'- 


AGCTGAGCAGCGGAGTGA -3' 


18 


celbcbdlr 


27 


5'- 


TCCACTCACTCCGCTGCT -3' 


18 


celbdSr 


28 


5'- 


GTTCTGATACTGTCCAAG -3 f 


18 


celekpn 


29 


5'- 


ACAGGCGGCGTACAACAT -3 f 


18 


celggwf 


30 


5'- 


TTGAGGGATATGGTGACC -3' 


18 


celhgwf 


31 


5'- 


GAGAAACATATCCTGCAA -3' 


18 


celhgwr 


32 


5'- 


CCCAI 1 1 IATACCCAGGC -3* 


18 


celhgwr2 


33 


5'- 


TCTTGAGCAGCCATTGGA -3' 


18 


nl7a 


34 


5'- 


GATGGCCAGTTCACGTTTATATGG -3 ' 


24 


tokcelegwf 


35 


5'- 


AGCACTGGTTGGTGGTCCTGGTAG -3' 


24 


tokcelgwfii 


36 


5'- 


GATTGACGGGTTACAATTGGGAGAAC -3' 


26 


tokcelr 


37 


5'- 


AGWGCACCNACAAATCCGGCATTGTARTC -3* 


29 


tokgwl 


38 


5'- 


CTCCAGAATGTCATTTGTAAGATACAT -3' 


27 
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Table IV. 



Genes 
constructed 


Protein 
Construct 
Purified 


Thermal 
Stability °C" 


pH rate 
profile (2> 


Stonewash 
Effect 


El 


El 


55 


5-9 


-+ 


El/2 


El/2 


80 


4-11 


+ 


El/2/3 


El/2/3 


ND 


4-11 




B4/5 


B4/5 


55 


4-10 






B5 


70 


4-10 


+ 



(1) Thermal Stability - the highest temperature at which the protein 
maintains 100% of it's activity for 45 minutes at pH 7.0. 

(2) The protein maintains greater than or equal to 20% of it's maximum 
activity at 50° C. 

ND= not determined 
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